Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #333897

Title: Enrichment of statistical power for genome-wide association studies

Author
item LI, MENG - Nanjing Agricultural University
item LIU, XIAOLEI - Cornell University
item Bradbury, Peter
item YU, JIANMING - Kansas State University
item ZHANG, YUAN-MING - Nanjing Agricultural University
item TODHUNTER, RORY - Cornell University
item Buckler, Edward - Ed
item ZHANG, ZHIWU - Washington State University

Submitted to: BMC Plant Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/9/2014
Publication Date: 10/17/2014
Citation: Li, M., Liu, X., Bradbury, P., Yu, J., Zhang, Y., Todhunter, R., Buckler IV, E.S., Zhang, Z. 2014. Enrichment of statistical power for genome-wide association studies. Biomed Central (BMC) Plant Biology. 12:73.

Interpretive Summary: Genome-wide association studies (GWAS) are widely used in genetics research to identify genes that affect desirable traits like disease resistance, yield, and nutritional quality. If a group of individuals are better or worse for a trait than the population average, any genes they share will appear to influence that trait whether those genes actually affect the trait or not. This confounding is referred to as population structure. One of the most effective statistical methods for correcting for population structure is mixed linear models (MLM). This paper describes a modification of MLM called compressed MLM (CMLM). By grouping genetically similar individuals, CMLM improves the ability to detect causal genes and requires less computer time compared to MLM. This research used data for body mass in humans, hip dysplasia in dogs, and flowering time in maize and Arabidopsis to demonstrate these advantages. The method has been implemented in a popular software package for GWAS called GAPIT.

Technical Abstract: The inheritance of most human diseases and agriculturally important traits is controlled by many genes with small effects. Identifying these genes, while simultaneously controlling false positives, is challenging. Among available statistical methods, the mixed linear model (MLM) has been the most flexible and powerful for controlling population structure and individual unequal relatedness (kinship), the two common causes of spurious associations. The introduction of the compressed MLM (CMLM) method provided additional opportunities for optimization by adding two new model parameters: grouping algorithms and number of groups. This study introduces another model parameter to develop an enriched CMLM (ECMLM). The parameter involves algorithms to define kinship between groups (that is, kinship algorithms). The ECMLM calculates kinship using several different algorithms and then chooses the best combination between kinship algorithms and grouping algorithms. Simulations show that the ECMLM increases statistical power. In some cases, the magnitude of power gained by using ECMLM instead of CMLM is larger than the improvement found by using CMLM instead of MLM.