Skip to main content
ARS Home » Midwest Area » Columbia, Missouri » Plant Genetics Research » Research » Publications at this Location » Publication #403816

Research Project: Genetic and Physiological Mechanisms Underlying Complex Agronomic Traits in Grain Crops

Location: Plant Genetics Research

Title: Ensemble of best linear unbiased predictor, machine learning and deep learning models predict maize yield better than each model alone

Author
item Kick, Daniel
item Washburn, Jacob

Submitted to: in silico Plants
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/19/2023
Publication Date: 9/25/2023
Citation: Kick, D.R., Washburn, J.D. 2023. Ensemble of best linear unbiased predictor, machine learning and deep learning models predict maize yield better than each model alone. in silico Plants. 5(2). Article diad015. https://doi.org/10.1093/insilicoplants/diad015.
DOI: https://doi.org/10.1093/insilicoplants/diad015

Interpretive Summary: More accurately predicting how a crop variety will perform in an environment enables faster development of new crop varieties with favorable characteristics such as drought resistance or higher yield. Much focus has been given to what type of predictive model best represents the complex interactions between genes and environmental conditions. We show that if multiple models are used together the predictions are often better than those of the separate models. This work supports efforts to increase the quality and quantity of agricultural products by increasing the speed and accuracy of crop improvement.

Technical Abstract: Predicting phenotypes accurately from genomic, environment and management factors is key to accelerating the development of novel cultivars with desirable traits. Inclusion of management and environmental factors enables in silico studies to predict the effect of specific management interventions or future climates. Despite the value such models would confer, much work remains to improve the accuracy of phenotypic predictions. Rather than advocate for a single specific modelling strategy, here we demonstrate within large multi-environment and multi-genotype maize trials that combining predictions from disparate models using simple ensemble approaches most often results in better accuracy than using any one of the models on their own. We investigated various ensemble combinations of different model types, model numbers and model weighting schemes to determine the accuracy of each. We find that ensembling generally improves performance even when combining only two models. The number and type of models included alter accuracy with improvements diminishing as the number of models included increases. Using a genetic algorithm to optimize ensemble composition reveals that, when weighted by the inverse of each model’s expected error, a combination of best linear unbiased predictor, linear fixed effects, deep learning, random forest and support vector regression models performed best on this dataset.