Skip to main content
ARS Home » Plains Area » Las Cruces, New Mexico » Range Management Research » Research » Publications at this Location » Publication #407020

Research Project: Science and Technologies for the Sustainable Management of Western Rangeland Systems

Location: Range Management Research

Title: Exploring random forest machine learning and remote sensing data for streamflow prediction: An alternative approach to a process-based hydrologic modeling in a snowmelt-driven watershed

Author
item IFTEKHARUL ISLAM, KHANDAKER - New Mexico State University
item Elias, Emile
item CARROLL, KENNETH - New Mexico State University
item BROWN, CHRISTOPHER - New Mexico State University

Submitted to: Remote Sensing
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/8/2023
Publication Date: 8/11/2023
Citation: Iftekharul Islam, K., Elias, E.H., Carroll, K.C., Brown, C. 2023. Exploring random forest machine learning and remote sensing data for streamflow prediction: An alternative approach to a process-based hydrologic modeling in a snowmelt-driven watershed. Remote Sensing. 15(16). Article 3999. https://doi.org/10.3390/rs15163999.
DOI: https://doi.org/10.3390/rs15163999

Interpretive Summary: Physically-based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and the R studio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. The RFML model was customized to assess the model's performance for three training periods; the results indicated that the model's accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters' variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluates how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory Index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. Results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resources management, including snowmelt-driven semi-arid regions.

Technical Abstract: Physically-based hydrologic models require significant effort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting streamflow in the Rio Grande Headwaters near Del Norte, a snowmelt dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and the R studio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting streamflow in this region. The RFML model was customized to assess the model's performance for three training periods; the results indicated that the model's accuracy improved with longer training periods, implying that the model trained on a more extended period is better able to capture the parameters' variability and reproduce streamflow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluates how well the SWAT model performs in reproducing streamflow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean streamflow simulation, with satisfactory Index of agreement (d), coefficient of determination (R2), and percent bias (PBIAS) values but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to update the SWAT snowmelt routine for better monthly flow estimation. Results provide a critical analysis for enhancing streamflow prediction, which is valuable for further research and water resources management, including snowmelt-driven semi-arid regions.