Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #396290

Research Project: Genetic and Physiological Mechanisms Underlying Complex Agronomic Traits in Grain Crops

Location: Plant Genetics Research

Title: Yield prediction through integration of genetic, environment, and management data through deep learning

Author
item Kick, Daniel
item WALLACE, JASON - University Of Georgia
item SCHNABLE, JAMES - University Of Nebraska
item KOLKMANN, JUDITH - Cornell University
item ALACA, BORIS - Goettingen University
item BEISSINGER, TIMOTHY - Goettingen University
item Edwards, Jode
item ERTL, DAVID - Iowa Corn Promotion Board
item Flint-Garcia, Sherry
item GAGE, JOSEPH - North Carolina State University
item HIRSCH, CANDICE - University Of Minnesota
item Knoll, Joseph - Joe
item DE LEON, NATALIA - University Of Wisconsin
item LIMA, DAYANE - University Of Wisconsin
item MORETA, DANILO - Cornell University
item SINGH, MANINDER - Michigan State University
item THOMPSON, ADDIE - Michigan State University
item WELDEKIDAN, TECHLEMARIAM - University Of Delaware
item Washburn, Jacob

Submitted to: G3, Genes/Genomes/Genetics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/22/2022
Publication Date: 4/1/2023
Citation: Kick, D.R., Wallace, J.G., Schnable, J.C., Kolkmann, J.M., Alaca, B., Beissinger, T.M., Edwards, J.W., Ertl, D., Flint-Garcia, S.A., Gage, J.L., Hirsch, C.N., Knoll, J.E., de Leon, N., Lima, D.C., Moreta, D., Singh, M.P., Thompson, A., Weldekidan, T., Washburn, J.D. 2023. Yield prediction through integration of genetic, environment, and management data through deep learning. G3, Genes/Genomes/Genetics. 13(4). Article jkad006. https://doi.org/10.1093/g3journal/jkad006.
DOI: https://doi.org/10.1093/g3journal/jkad006

Interpretive Summary: Predicting crop yield for a given cultivar and location is impeded by interaction effects -- cultivars don't always behave the same in each environment. More accurate predictions could reduce the time needed to develop new cultivars through genomic selection to identify cultivars well suited to specific environments. Here we improve maize yield prediction accuracy by using genetic, environmental, and management data, along with interactions between these data types in a deep learning model. This model is more accurate (on average) than the other models tested; both machine learning models and linear models. Interactions between data type is key to this model's performance and changes the importance of variables in the data. Additionally, we detail the process of model development to aid others in creating models for their crop of interest or improving upon this model.

Technical Abstract: Accurate prediction of the phenotypic outcomes produced by different combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decades have seen an expansion of new methods applied toward this goal. Here we predict maize yield using deep neural networks, compare the efficacy of 2 model development methods, and contextualize model performance using conventional linear and machine learning models. We examine the usefulness of incorporating interactions between disparate data types. We find deep learning and best linear unbiased predictor (BLUP) models with interactions had the best overall performance. BLUP models achieved the lowest average error, but deep learning models performed more consistently with similar average error. Optimizing deep neural network submodules for each data type improved model performance relative to optimizing the whole model for all data types at once. Examining the effect of interactions in the best-performing model revealed that including interactions altered the model's sensitivity to weather and management features, including a reduction of the importance scores for timepoints expected to have a limited physiological basis for influencing yield—those at the extreme end of the season, nearly 200 days post planting. Based on these results, deep learning provides a promising avenue for the phenotypic prediction of complex traits in complex environments and a potential mechanism to better understand the influence of environmental and genetic factors.