Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #308920

Title: An alternative covariance estimator to investigate genetic heterogeneity in populations

Author
item HESLOT, NICOLAS - Cornell University
item Jannink, Jean-Luc

Submitted to: Genetics Selection Evolution
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/12/2015
Publication Date: 11/26/2015
Citation: Heslot, N., Jannink, J. 2015. An alternative covariance estimator to investigate genetic heterogeneity in populations. Genetics Selection Evolution. 47:93 doi: 10.1186/s12711-0150171-Z.

Interpretive Summary: Genomic predictions and GWAS have used mixed models to identify associations between markers and traits and to predict traits. In both cases, the genetic covariance between individuals is estimated using molecular markers. Statistical theory assumes these estimates have no error when in fact they do. Thus, according to theory, adding individuals to the training population should always increase accuracy but empirical evidence suggests that accuracy sometimes decreases. Even at high marker density and under a genetic architecture with many small additive loci, the genetic covariance between individuals which depends on causal loci may not be well estimated by the whole-genome covariance. We propose an alternative covariance estimator or kernel named K-kernel to account for varying levels of uncertainty in the relationship matrix. The K-kernel allows setting some covariances to zero and thus may lead to appropriate trimming of the training population to maximize accuracy. The K-kernel is compared to known kernels for fit to the data, cross-validated accuracy and suitability for GWAS on a number of datasets. We show that alternative kernels usually provide a significantly better fit to the data than the standard kinship. This better fit can increase prediction accuracy and lead to better type I error control and greater detection power.

Technical Abstract: Genomic predictions and GWAS have used mixed models for identification of associations and trait predictions. In both cases, the covariance between individuals for performance is estimated using molecular markers. Mixed model properties indicate that the use of the data for prediction is optimal if the covariance is known. Under this assumption adding individuals to the training population data should never be detrimental. But empirical evidence suggests that a larger training population can decrease prediction accuracy. Good estimate of the covariance between individuals is also needed for GWAS using mixed models. Recent theoretical results showed that even at high marker density and under a genetic architecture with many small additive loci, the covariance between individuals which depend on the relationship at causal loci is not well estimated by the whole-genome kinship for distantly related individuals. This problem cannot be solved by simple shrinkage of the whole-genome kinship because the level of uncertainty in the relationship matrix coefficients depends not on their numerical values but on their expectation based on pedigree. We propose an alternative covariance estimator or kernel named K-kernel to account for varying levels of uncertainty in the relationship matrix. The K-kernel can allow sparsity and, as a consequence, can be used for training population optimization. The K-kernel is compared to known kernels for fit to the data, cross-validated accuracy and suitability for GWAS on a number of datasets. We show that alternative kernels usually provide a significantly better fit to the data than the simple whole-genome kinship and increase cross-validated accuracies depending on the trait. In GWAS they tend to control as well or better type I errors and increase statistical power.