Author
HESLOT, NICOLAS - Cornell University | |
AKDEMIR, DENIZ - Cornell University | |
SORRELLS, MARK - Cornell University | |
Jannink, Jean-Luc |
Submitted to: Theoretical and Applied Genetics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 10/31/2013 Publication Date: 11/22/2013 Citation: Heslot, N., Akdemir, D., Sorrells, M.E., Jannink, J. 2013. Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theoretical and Applied Genetics. 127:463-480. Interpretive Summary: A key difficulty in plant breeding is that breeding lines respond differently to different environments, a phenomenon called genotype by environment interaction (G*E). We have weather data on evaluation environments that can be used to characterize each evaluation environment. Weather data variable are difficult to use for predicting G*E: variables are correlated with each other and each has a weak relationship with G*E. In addition, non-linear responses of genotypes to weather stress are expected to further complicate the analysis. Genomic selection can successfully predict breeding line performance using marker data that share some of the characteristics of weather data. Using a crop model to derive stress variables from daily weather data, we propose models based on genomic selection to predict G*E. We also develop a new method, soft rule fit, to improve this model and capture non-linear responses of genes to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data was available increased by 11% on average and the variability in prediction accuracy decreased by 11%. By leveraging agronomic knowledge and the large datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios. Technical Abstract: Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection GS methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data was available increased by 11.1% on average and the variability in prediction accuracy decreased by 10.8%. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios. |