Location: Plant Genetics Research
Project Number: 5070-21220-046-009-I
Project Type: Interagency Reimbursable Agreement
Start Date: May 21, 2023
End Date: May 18, 2025
Objective:
The purpose of this project is to improve phenotypic prediction to enable more effective cultivar development through more accurate and interpretable predictive modeling that incorporates genome-by-environment-by-management effects. Model development will focus on maize due to its economic importance and the availability of large public data sets with extensive phenotype, genetic, environmental, and management data. The first aim of this project is to develop improved models for predicting yield from weather, genomic, soil, and management data using several deep learning approaches while the second focuses on developing models informed by biological theory, enabling greater model interpretability. The final aim is to test the ability of models optimized for one crop to be repurposed for other crops – requiring less data and fewer computational resources than would be required to create de novo models for each new crop. For this final aim, maize and what will be considered.
Approach:
Aim 1. Development of a State of the Art Deep Learning Phenotypic
Deep neural networks predicting will be created predicting yield from exclusively genomic or environmental and management data. For genomic networks, fully connected, convolutional, and recurrent neural network architectures will be considered in combination with several data preprocessing strategies. For environmental and management networks, convolutional and recurrent neural network architectures, and pretraining with historical data will be considered.
Aim 2. Incorporate Genotype-by-Environment Interactions and Biological Theory
In addition to the models produced from aim 1, a neural network with architecture informed by known gene pathways in maize will be constructed. This network, and the best models from aim 1 will be used to create interaction models. Three interaction subnetworks will be considered which directly predict yield, produce variables for a simplified crop growth model (CGM), or produce the same number of variables as in the second subnetwork without the constraint that they be CGM parameters. The input features most important to the predictions of the best network will be identified and compared against the literature.
Aim 3. Assess Transferability of Trained Models to Other Crops and Multi-Crop Models
Networks from previous aims will be used in a proof of concept for using models or data from one crop to advantage predictions for another. Two methods will be considered: Transfer learning, where a model trained for one crop is used as a starting point and tuned for the target crop, and multitarget learning, where a network capable of predicting yield for two crops at once is trained. Maize and wheat will be considered here, and the amount of crop specific data available for tuning and ratio of crop specific data available will be considered.