Location: Environmental Microbial & Food Safety Laboratory
Title: Estimating parameters of empirical infiltration models from the global data set using the machine learning algorithmAuthor
KIM, SEONGYUN - Orise Fellow | |
Pachepsky, Yakov | |
KARAHAN, GULAY - Cankiri Karatekin University | |
Sharma, Manan |
Submitted to: Journal of Hydrology
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 1/29/2021 Publication Date: 4/1/2021 Citation: Kim, S., Pachepsky, Y.A., Karahan, G., Sharma, M. 2021. Estimating parameters of empirical infiltration models from the global data set using the machine learning algorithm. Journal of Hydrology. https://doi.org/10.31545/intagr/132922. DOI: https://doi.org/10.31545/intagr/132922 Interpretive Summary: Water infiltration to soil must be predicted to address various environmental and agricultural issues. Many equations were proposed to simulate infiltration. Coefficients of these equations, also called parameters, reflect local soil and vegetation conditions. Obtaining infiltration parameters from measurements is impractical In large-scale projects. The objective of this work was to use the Global Soil Water Infiltration database to determine if the infiltration parameters can be estimated from readily available soil and vegetation data. The machine learning algorithm random forest was applied to obtain predictions of parameters for two popular infiltration equations. We found the measurement method was by far the most influential predictor, and for the same method knowing soil basic properties provided parameter estimates suitable for large-scale applications. The results of this work can be useful to a large group of environmental professionals who are applying infiltration equations in their projects. Technical Abstract: Water infiltration into soil is the key process of the hydrological cycle. Many equations are proposed to simulate infiltration. The predictive models to estimate coefficients of infiltration equations from soil and vegetation properties have usually been developed for a single infiltration equation, and sometimes for several equations, from a single experimental dataset obtained with a single infiltration measurement method. Our hypotheses were that (a) the accuracy of a coefficient prediction model for a particular infiltration equation may be better with the subset of data with which this infiltration equation performs better, and (b) the infiltration measurement method can be the influential predictor of the infiltration equation coefficients. The objective of this work was to test these hypotheses using the commonly employed Horton and Mezencev infiltration equations and the large international soil infiltration database SWIG. We were also interested in analyzing the predictor variable importance in the models for the infiltration equation parameters as determined by the random forest algorithm which was employed in this work. Soil-landscape properties available as predictor variables in the SWIG database in 1825 datasets were soil textural fractions, organic carbon content, bulk density, land use class, and the infiltration measurement method. The Horton model had lower RMSE than Mezencev with 928 datasets; the opposite was true with 758 datasets. The minidisk infiltrometer and the double ring infiltrometer were the most numerous datasets within the subsets with lower Horton and lower Mezencev RMSEs, respectively. The accuracy of the random forest models to estimate Horton infiltration equation coefficients did not substantially differ between the parameter sets obtained from all data and from data where the Horton equation had lower RMSE. The same was true for the Mezencev equation. The infiltration measurement method appeared to be the most important predictor. The random forest model RMSE decreased from 2 to 25 % when only datasets with the same infiltration measurement method were used. RMSE values of logarithms of these infiltration equation coefficient models ranged from 0.24 to 0.44. The most important predictors were soil textural fractions and organic carbon content. Bulk density and land use appeared to be relatively less important. The global soil water infiltration database contains data sufficient for the development of random forest models for coefficients of Mezencev and Horton infiltration equations accuracy that can be sufficient for some applications. The development of such equations for subsets of data having the same infiltrartion measurement method can improve the accuracy of infiltration equation coefficient estimates. |