Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Hydrology and Remote Sensing Laboratory » Research » Publications at this Location » Publication #373398

Research Project: Integrating Remote Sensing, Measurements and Modeling for Multi-Scale Assessment of Water Availability, Use, and Quality in Agroecosystems

Location: Hydrology and Remote Sensing Laboratory

Title: Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest

Author
item KANG, Y. - University Of Wisconsin
item OZDOGAN, M. - University Of Wisconsin
item ZHU, X. - University Of Wisconsin
item YE, Z. - University Of Wisconsin
item HAIN, C. - Nasa Marshall Space Flight Center
item Anderson, Martha

Submitted to: Environmental Research Letters
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/28/2020
Publication Date: 5/19/2020
Citation: Kang, Y., Ozdogan, M., Zhu, X., Ye, Z., Hain, C., Anderson, M.C. 2020. Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environmental Research Letters. 15:064005. https://doi.org/10.1088/1748-9326/ab7df9.
DOI: https://doi.org/10.1088/1748-9326/ab7df9

Interpretive Summary: Foreign and domestic yield estimation is a critical function of the USDA. Traditionally based on weather observations, in the past few decades satellite imagery and other types of geospatial data have played an increasingly important role in monitoring and forecasting yield. Given the operational cost of ingesting new datasets into existing monitoring systems, it is important to be able to identify which data are of most value for a given crop and region. This paper describes a machine learning approach for testing the relative value of an extensive set of weather data, satellite observations, land-surface models, soil maps, and crop progress reports in predicting corn yields over the U.S. Midwest. Of highest value were multi-wavelength satellite indices describing vegetation green chlorophyll content and biomass. Other satellite indices describing crop water stress and moisture availability also ranked highly among the variables tested. Studies like this help to inform improvements to current yield estimation strategies, focusing on new datasets that are demonstrated to add most significant value.

Technical Abstract: Crop yield estimates over large areas are conventionally made using weather observations, but a comprehensive understanding of the effects of various environmental indicators and the choice of prediction algorithm remains elusive. Here we present a thorough assessment of county-level maize yield prediction in U.S. Midwest using six machine learning algorithms and an extensive set of environmental variables derived from satellite observations, weather data, land surface model results, soil maps, and crop progress reports. Results show that seasonal crop yield forecasting benefits from both more advanced algorithms and a large composite of information associated with crop canopy, weather, and soil (i.e. hundreds of features). Combining the best algorithm, inputs, and observation frequency improves the prediction accuracy by up to 7.9% compared to a baseline statistical model using only climatic and satellite observations. This study provides insights into practical crop yield forecasting and the understanding of yield response to climatic and environmental conditions.