Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Adaptive Cropping Systems Laboratory » Research » Publications at this Location » Publication #410523

Research Project: Sustainable and Resilient Crop Production Systems Based on the Quantification and Modeling of Genetic, Environment, and Management Factors

Location: Adaptive Cropping Systems Laboratory

Title: Cotton yield prediction: A machine learning approach with field and synthetic data

Author
item MITRA, ALAKANANDA - University Of Nebraska
item BEEGUM, SAHILA - University Of Nebraska
item Fleisher, David
item Reddy, Vangimalla
item SUN, WENGUANG - Colorado State University
item RAY, CHITTARANJAN - University Of Nebraska
item Timlin, Dennis
item MALAKAR, ARINDAM - University Of Nebraska

Submitted to: IEEE Access
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/6/2024
Publication Date: 6/24/2024
Citation: Mitra, A., Beegum, S., Fleisher, D.H., Reddy, V., Sun, W., Ray, C., Timlin, D.J., Malakar, A. 2024. Cotton yield prediction: A machine learning approach with field and synthetic data. IEEE Access. (12):101273-101288. https://doi.org/10.1109/access.2024.3418139.
DOI: https://doi.org/10.1109/access.2024.3418139

Interpretive Summary: To tackle the adverse effects of climate change, unprecedented population growth, natural weather calamities, natural resource depletion, and to ensure food security, smart sustainable agriculture is the future for agriculture. It promotes maximizing food production and minimizing environmental impact. The ability to estimate crop yield is a helpful component of smart sustainable agriculture and is needed to evaluate best on-farm management practice. This paper provides a smart and intelligent solution to predict crop yield using Artificial Intelligence (AI) and heat units (growing degree days). Cotton was selected as the study crop. The test location is the U.S. cotton belt which included nine farm locations within three southern states. The process-based GOSSYM model was also used to generate additional cotton yield data associated with potential future climate change data. Several Machine Learning algorithms which differed in their formulations were tested. Two of the ML algorithms showed very high accuracy which demonstrated that a simple approach using AI can be used to simulate crop yields across different weather, variety and soil type characteristics. These results can be used to improve the potential of smart sustainable agriculture to help provide decision support to farmers and crop consultants.

Technical Abstract: The United States cotton industry is devoted to sustainable production strategies that reduce water, land, and energy consumption while enhancing soil health and cotton yield. Climate-smart agriculture solutions are being developed to increase yields and lower operational costs. However, crop yield prediction is challenging due to the complex and nonlinear interactive effects of cultivar, soil type, management, pest and disease, climate, and weather patterns on crops. To address this challenge, machine learning (ML) method was used to predict yield, considering climatic change, soil diversity, cultivars, and fertilizer applications. Field data were collected over the US southern cotton belt in the 1980s and 1990s. A second data source was generated from the process-based crop model, GOSSYM, to reflect the most recent effects of climate change over the last six years (2017–2022). We focused on three different locations within each of the three southern states: Texas, Mississippi, and Georgia. Accumulated heat units (AHU) for each set of experimental data were used as an analogue for time-series weather data to reduce the number of computations. Random Forest (RF) regressor, Support Vector Regression (SVR), Light Gradient Boosting Machine (LightGBM) regressor, Multiple Linear Regression (MLR), and neural networks were evaluated. Cross-validation was employed to obtain an improved model that did not suffer from overfitting. The RF Regressor was able to achieve an accuracy of 97.75%, with an R2 of roughly 0.98 and a root mean square error of 55.05 kg/ha. These results show the development and use of an ML approach can be used as a robust and simple model for assisting the cotton climate-smart effort.