Author
ZHANG, ZHIWU - Cornell University | |
ERSOZ, ELHAN - Cornell University | |
LAI, CHAO-QIANG - Tufts University | |
TODHUNTER, R.J. - Cornell University | |
TIWARI, HEMANT - University Of Alabama | |
Gore, Michael | |
Bradbury, Peter | |
YU, JIANMING - Kansas State University | |
ARNETT, D.K. - Tufts University | |
ORDOVAS, JOSE - Tufts University | |
Buckler, Edward - Ed |
Submitted to: Nature Genetics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 2/9/2010 Publication Date: 3/7/2010 Citation: Zhang, Z., Ersoz, E., Lai, C., Todhunter, R., Tiwari, H.K., Gore, M.A., Bradbury, P., Yu, J., Arnett, D., Ordovas, J.M., Buckler IV, E.S. 2010. Mixed linear model approach adapted for genome-wide association studies. Nature Genetics. 42:355-360. Interpretive Summary: Genome-wide association studies (GWAS) have the potential to pinpoint genetic polymorphisms underlying human diseases and agriculturally important traits, but false discoveries are a major concern. False discoveries can be partially attributed to spurious associations caused by population structure and unequal relatedness among individuals in the data set being analyzed. General linear model (GLM)-based methods were used in the past to address population sub-structure. More recently, Mixed Linear Model (MLM) approaches have been preferred, since they simultaneously account for population structure and unequal relatedness among individuals. Although MLM methods are now commonly used, with large datasets, these methods can be computationally challenging. This paper introduces several methods for controlling population structure and relatedness in genome-wide studies and compares them to current methods. The “compressed MLM method” decreases the effective sample size of datasets by clustering individuals into groups. A complementary approach in which the “population parameters are previously determined” (P3D) is also introduced. This latter approach eliminates the need to re-compute variance components in an equation, speeding up the entire process so that computing time is markedly reduced without losing any statistical power. Indeed, in some cases, statistical power improved. This paper applied these methods to better understand genetic association datasets for human, dog, and maize, and it is likely that the methods will be successful in other species as well. Technical Abstract: Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called ‘compressed MLM,’ that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, ‘population parameters previously determined’ (P3D), that eliminates the need to re-compute variance components. We applied these two methods, both singly and combined, in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL. |