Location: Plant Genetics Research
Title: Yield prediction through integration of genetic, environment, and management data through deep learningAuthor
Kick, Daniel | |
WALLACE, JASON - University Of Georgia | |
SCHNABLE, JAMES - University Of Nebraska | |
KOLKMANN, JUDITH - Cornell University | |
BORIS, ALACA - Goettingen University | |
BEISSINGER, TIMOTHY - Goettingen University | |
IRTL, DAVID - Iowa Corn Promotion Board | |
Flint-Garcia, Sherry | |
GAGE, JOSEPH - North Carolina State University | |
HIRSCH, CANDICE - University Of Minnesota | |
Knoll, Joseph - Joe | |
DE LEON, NATALIA - University Of Wisconsin | |
LIMA, DAYANE - University Of Wisconsin | |
MORETA, DANILO - Cornell University | |
SINGH, MANINDER - Michigan State University | |
WELDEKIDAN, TECHLEMARIAN - University Of Delaware | |
Washburn, Jacob |
Submitted to: bioRxiv
Publication Type: Pre-print Publication Publication Acceptance Date: 7/30/2022 Publication Date: 7/30/2022 Citation: Kick, D.R., Wallace, J.G., Schnable, J.C., Kolkmann, J.M., Boris, A., Beissinger, T.M., Irtl, D., Flint Garcia, S.A., Gage, J.L., Hirsch, C.N., Knoll, J.E., De Leon, N., Lima, D.C., Moreta, D., Singh, M.P., Weldekidan, T., Washburn, J.D. 2022. Yield prediction through integration of genetic, environment, and management data through deep learning. bioRxiv. Article bioRxiv 2022.07.29.502051. https://doi.org/10.1101/2022.07.29.502051. DOI: https://doi.org/10.1101/2022.07.29.502051 Interpretive Summary: Predicting crop yield for a given cultivar and location is impeded by interaction effects -- cultivars don't always behave the same in each environment. More accurate predictions could reduce the time needed to develop new cultivars through genomic selection and identify cultivars well suited to specific environments. Here we improve maize yield prediction accuracy by using genetic, environmental, and management data, along with interactions between these data types in a deep learning model. This model is more accurate (on average) than the other models tested; both machine learning models and linear models. Interactions between data type is key to this model's performance and changes the importance of variables in the data. Additionally, we detail the process of model development to aid others in creating models for their crop of interest or improving upon this model. Technical Abstract: Accurate prediction of an organism’s phenotype for combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decade has seen an expansion of the methods applied towards this aim. Here we predict maize yield using deep neural networks, compare the efficacy of two model development methods, contextualize model performance using linear and machine learning models, and examine the usefulness of incorporating interactions between disparate data types. From the best performing model we discuss the influence of interactions between data types on the salience of features in the data set |