Skip to main content
ARS Home » Plains Area » Lubbock, Texas » Cropping Systems Research Laboratory » Plant Stress and Germplasm Development Research » Research » Publications at this Location » Publication #410596

Research Project: Development of Climate Resilient Germplasm and Management Tools for Sustainable Row Crop Production

Location: Plant Stress and Germplasm Development Research

Title: Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms

Author
item Pugh, Nicholas - Ace
item Young, Andrew
item OIJHA, MANISHA - New Mexico State University
item Emendack, Yves
item Sanchez, Jacobo
item Xin, Zhanguo
item PUPPALA, NAVEEN - New Mexico State University

Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 2/2/2024
Publication Date: 2/20/2024
Citation: Pugh, N.A., Young, A.C., Oijha, M., Emendack, Y., Sanchez, J., Xin, Z., Puppala, N. 2024. Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms. Frontiers in Plant Science. 15. https://doi.org/10.3389/fpls.2024.1339864.
DOI: https://doi.org/10.3389/fpls.2024.1339864

Interpretive Summary: This study is about improving the way we select and breed peanuts, a very important food crop around the world. We focused on finding a better method to predict how much a peanut plot will produce, which is a big challenge because peanut pods develop underground. To do this, we used drones (unmanned aerial vehicles or UAVs) to take detailed pictures of peanut plants as they grew in 2021 and 2022. From these images, we created detailed growth charts for the peanut plants, looking at things like how much area the plant covers and how tall it is. Then, we used advanced computer techniques, known as machine learning models, to analyze these growth charts, take important information from them, and predict peanut yield. We specifically used two types of machine learning models: random forest and eXtreme Gradient Boosting (XGBoost). The results were impressive. The random forest model was very accurate in predicting peanut yields, with ~93% accuracy. The XGBoost model was also effective, with an ~88% accuracy. These models were not only good at estimating how much a peanut crop would produce, but they were also useful in identifying which peanut plots performed poorly and which ones were superior. This is especially helpful for peanut breeders who want to choose the best plants for future crops.

Technical Abstract: Peanut is a critical food crop worldwide, and the development of high-throughput phenotyping techniques is essential for enhancing the crop's genetic gain rate. Given the obvious challenges of directly estimating peanut yields through remote sensing, an approach that utilizes above-ground phenotypes to estimate underground yield is necessary. To that end, this study leveraged unmanned aerial vehicles (UAVs) for high-throughput phenotyping of surface traits in peanut. Using a diverse set of peanut germplasm planted in 2021 and 2022, UAV flight missions were repeatedly conducted to capture image data that were used to construct high-resolution multitemporal sigmoidal growth curves based on apparent characteristics, such as canopy cover and canopy height. Latent features extracted from these growth curves and their first derivatives informed the development of advanced machine learning models, specifically random forest and eXtreme Gradient Boosting (XGBoost), to estimate yield in the peanut plots. The random forest model exhibited exceptional predictive accuracy (R2 = 0.93), while XGBoost was also reasonably effective (R2 = 0.88). When using confusion matrices to evaluate the classification abilities of each model, the two models proved valuable in a breeding pipeline, particularly for filtering out underperforming genotypes. In addition, the random forest model excelled in identifying top-performing material while minimizing Type I and Type II errors. Overall, these findings underscore the potential of machine learning models, especially random forests and XGBoost, in predicting peanut yield and improving the efficiency of peanut breeding programs.