Skip to main content
ARS Home » Southeast Area » Stoneville, Mississippi » Crop Genetics Research » Research » Publications at this Location » Publication #395398

Research Project: Practices for Management of Predominant Nematodes and Fungal Diseases for Sustainable Soybean Production

Location: Crop Genetics Research

Title: Predicting and interpreting cotton yield and its determinants under long-term conservation management practices using machine learning

Author
item DHALIWAL, JASHANJEET - University Of Tennessee
item PANDAY, DINESHA - University Of Tennessee
item SAHA, DEBASISH - University Of Tennessee
item LEE, JAEHOON - University Of Tennessee
item JAGADAMMA, SINDHU - University Of Tennessee
item SCHAEFFER, SEAN - University Of Tennessee
item Mengistu, Alemu

Submitted to: Computers and Electronics in Agriculture
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/27/2022
Publication Date: 6/15/2022
Citation: Dhaliwal, J.K., Dinesha, P., Saha, D., Lee, J., Jagadamma, S., Schaeffer, S., Mengistu, A. 2022. Predicting and interpreting cotton yield and its determinants under long-term conservation management practices using machine learning. Computers and Electronics in Agriculture. https://doi.org/10.1016/j.compag.2022.107107.
DOI: https://doi.org/10.1016/j.compag.2022.107107

Interpretive Summary: Accurate crop yield prediction is of critical importance to make strategic management decisions to sustain the yield of cotton without adversely impacting the environment. Our model, which used machine learning methods, predicted that cotton yield was more responsive to management variables (e.g., nitrogen fertilization rate, cover crop, and duration of no-till). It also predicted that the optimal nitrogen application rate to be at 60 kilograms per hectare for cotton lint yield and showed that no-till enhanced yield after 15 years of practice. In addition, long-term adoption of hairy vetch as a cover crop was predicted to have the potential to increase cotton yield as compared to winter wheat cover crop or fallow. Among the climate variables, cotton lint yield was most impacted by average maximum temperature and precipitation at flowering to open boll period. The result highlights the need to build a more robust model to determine cotton yield that can be used by extension agents or crop consultants.

Technical Abstract: Accurate predictions of crop yield are an integral part of effective agricultural tactical and strategic management decisions to sustain the yield without adversely impacting the environment. Process-based simulation models are widely used to predict crop yields, but their application remains limited by the requirements of substantial expertise, intensive data and extensive calibration. Therefore, greater attention is currently being devoted to machine learning (ML) methods that are more computationally expedient. We evaluated six ML algorithms (linear regression, ridge regression, lasso regression, random forest, XGboost and artificial neural network (ANN)) to predict cotton (Gossypium spp.) yield and determine the yield response to critical determinants using long-term (1986-2018) data on management, climate, historical yield and point measurement of soil organic carbon (SOC) from continuous no-till (NT) cotton cropping system in west Tennessee. Data from 1986-2015 were used for model training, hyper-parameterization and testing, while data from 2016-2018 were used for independent model validation. Results showed that tree-based models (random forest and XGboost) outperformed other models in predicting lint yield. The variable importance scores predicted by random forest model, indicated that cotton yield was more responsive to management variables (Nitrogen (N) fertilization rate, cover crop, and years since NT establishment) followed by soil (SOC) and climate variables at the study site. The model identified optimal N application at 60 kg ha-1 N rate for cotton lint yield, and also highlighted that the benefits of NT in enhancing yield can be achieved after 15 years of practice. The model predicted that the long-term adoption of hairy vetch (Vicia villosa), a legume cover crop, has the potential to increase cotton yield. Among the climate variables, cotton lint yield was most impacted by average maximum temperature and precipitation during flowering to open boll period. While random forest and XGboost proved to be the most effective ML models in this study, multi-site data is needed to build a more robust model with greater generalization and interpretation capabilities under a wider prediction domain.