Skip to main content
ARS Home » Southeast Area » Booneville, Arkansas » Dale Bumpers Small Farms Research Center » Research » Publications at this Location » Publication #409342

Research Project: Sustainable Small Farm and Organic Grass and Forage Production Systems for Livestock and Agroforestry

Location: Dale Bumpers Small Farms Research Center

Title: Influence of sample size, model selection, and land use on prediction accuracy of soil properties

Author
item SAFAEE, SAMIRA - Purdue University
item Libohova, Zamir
item KLADIVKO, EILEEN - Purdue University
item BROWN, ANDREW - Natural Resources Conservation Service (NRCS, USDA)
item Winzeler, Hans - Edwin
item Read, Quentin
item RAHMANI, SHAMS - Purdue University
item Adhikari, Kabindra

Submitted to: Geoderma Regional
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/24/2024
Publication Date: 1/24/2024
Citation: Safaee, S., Libohova, Z., Kladivko, E., Brown, A., Winzeler, H.E., Read, Q.D., Rahmani, S., Adhikari, K. 2024. Influence of sample size, model selection, and land use on prediction accuracy of soil properties. Geoderma Regional. https://doi.org/10.1016/j.geodrs.2024.e00766.
DOI: https://doi.org/10.1016/j.geodrs.2024.e00766

Interpretive Summary: Soil organic matter (SOM) and soil nutrient capacity are important to support plant growth and high yield. Detailed maps are needed for precision agriculture management of small farms. However, high accurate maps require intensive field soil sampling that is costly, especially for small farms with limited resources. Digital soil mapping combined with field and laboratory data as well as remote sensing and models promise to deliver such products and at a lower cost. Through a combination of different models, sampling densities and high resolution elevation data the study found that increasing the number of samples improved the accuracy of the property maps. However, increasing the sample density past a critical value did not improve the accuracy of maps, indicating that higher resolution soil property maps for SOM and fertility can be achieved with less samples thus saving cost. The study also found that diverse and intense management practices require more samples that the fields that are uniformly and less intensively managed. The finding from this study help small holding farms to create high resolution maps at a lower cost while supporting a sustainable use of recourses.

Technical Abstract: Digital soil mapping (DSM) utilizes models that integrate field and laboratory data with environmental factors to predict soils and properties. The accuracy predictions depend on the models used, the data collected, and the environmental factors. This study assesses the influence of sampling density and distribution, covariates, and modeling on the accuracy prediction of soil organic matter (SOM) and cation exchange capacity (CEC) at three sites in Indiana (ACRE; DPAC; SEPAC) with different management intensity and sampling designs. Ordinary Kriging (OK) and three machine learning models Cubist (CB), Random Forest (RF), and Regression Kriging (RK) were used. The Coefficient of Determination (R2), Root Mean Square Error (RMSE), Mean Square Error (MSE), concordance coefficient (pc), and bias were used for the accuracy assessment. The accuracy predictions were influenced by the site, sample density, model, soil property, and the interactions. Sites were the single largest source of significant variation followed by sampling density and models for both SOM and CEC. ACRE, with multiple fields and diverse/complex management practices, had a higher average RMSE and wider range of RMSE for SOM compared to SEPAC and DPAC with uniform management. At ACRE the prediction accuracy (RMSE value) for SOM decreased from 2.75 to 0.85 and decreased from 17.38 to 3.61 for CEC with increasing number of samples from 36 (6 points/ha) to 66 (12points/ha), but did not change with further increases up to 146 samples. At SEPAC and DPAC the RMSE decreased only slightly after 5 points/ha and 1-2 points/ha, respectively (68 and 43 samples, respectively). All models performed poorly for SOM with R2 varying from 0.13 to 0.38, while for CEC the model performance varied widely from 0.11 to 0.64. The accuracy predictions for CEC were higher compared to SOM at all sites. Overall, RF performed better while OK performed the worst for both SOM and CEC. The mean R2 values across all sites were 0.35 (SOM) and 0.51 (CEC) for RF and 0.19 (SOM) and 0.17 (CEC) for OK. Spatial predictions for CB, RF and RK were more detailed and conformed to soil landscape models compared to OK. The spatial differences between sampling densities for predicted SOM and CEC were higher in lower elevation areas compared to higher elevation areas. The results from this study demonstrate that the selection of modelling approach is site-specific, and depends on sampling density, soil property and their interactions.