Skip to main content
ARS Home » Plains Area » El Reno, Oklahoma » Oklahoma and Central Plains Agricultural Research Center » Agroclimate and Hydraulics Research Unit » Research » Publications at this Location » Publication #392202

Research Project: Adapting Agricultural Production Systems and Soil and Water Conservation Practices to Climate Change and Variability in Southern Great Plains

Location: Agroclimate and Hydraulics Research Unit

Title: A framework of integrating heterogeneous data sources for non-stationary monthly streamflow prediction using a state-of-the-art deep learning model

Author
item XU, WENXIN - Wuhan University
item CHEN, JIE - Wuhan University
item Zhang, Xunchang
item XIONG, LIHUA - Wuhan University
item CHEN, HUA - Wuhan University

Submitted to: Journal of Hydrology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/27/2022
Publication Date: 10/29/2022
Citation: Xu, W., Chen, J., Zhang, X.J., Xiong, L., Chen, H. 2022. A framework of integrating heterogeneous data sources for non-stationary monthly streamflow prediction using a state-of-the-art deep learning model. Journal of Hydrology. 614(2022). Article 128599. https://doi.org/10.1016/j.jhydrol.2022.128599.
DOI: https://doi.org/10.1016/j.jhydrol.2022.128599

Interpretive Summary: Deep machine learning has been widely used in hydrological prediction such as monthly streamflow, and its performance is usually dependent on the abundance of training data. Even though the interest in using predictors from multiple data sources (e.g., streamflow observations, local meteorological data, and large-scale climate indexes) to train deep learning models for monthly streamflow prediction is growing, these predictors are usually selected from historical periods. Such approaches have limitations because the non-stationary future climate information is not included in the deep learning models. Climate models can provide non-stationary climate information for the future period, which may be useful for monthly streamflow prediction. This study aims at (1) investigating the added value of using information derived from global climate models (GCMs) for monthly streamflow prediction based on a state-of-the-art deep learning model, and (2) proposing a framework for integrating heterogeneous data sources for monthly streamflow prediction. The framework consists of five integral components: data collection, predictor combination, predictor selection, model construction, and results evaluation. The proposed framework is tested by using six hydrological stations from mainstream and six stations from tributaries of the Yangtze River. The results show that GCM forecasts are useful predictors to improve the prediction accuracy for monthly streamflow predictions, especially for the 1- and 3-month lead times. Combining GCM forecasts with either historical meteorological data or historical streamflow observations and meteorological data as predictors generally provides the best predictive performance. In addition, using large-scale climate indexes as ancillary information is able to improve the predictive performance at a lead time of 6 months. Overall results show that (1) the inclusion of predicted information from GCMs can improve the performance for monthly streamflow prediction, and (2) the way of combining various heterogeneous data sources is crucial. This work provides a useful means to hydrologists, hydrological engineers, and water resource managers for predicting and managing stream flows.

Technical Abstract: Deep learning has been widely used in hydrological prediction such as monthly streamflow and its performance is usually dependent on the abundance of training data. Even though the interest in using predictors from multiple data sources (e.g., streamflow observations, local meteorological data, and large-scale climate indexes) to train deep learning models for monthly streamflow prediction is growing, these predictors are usually selected from historical periods. Such approaches have limitations that the non-stationary future climate information is not included in the deep learning models. Climate models can provide non-stationary climate information for the future period, which may be useful for monthly streamflow prediction. Therefore, the objectives of this study are to (1) investigate the added value of using predictors derived from global climate models (GCMs) for monthly streamflow prediction based on a state-of-the-art deep learning model, and (2) propose a framework for integrating heterogeneous data sources for monthly streamflow prediction. The framework consists of five integral components: data collection, predictor combination, predictor selection, model construction, and results evaluation. The proposed framework is tested by using six hydrological stations from mainstream and six stations from tributaries of the Yangtze River. The results show that GCM forecasts are useful predictors to improve the prediction accuracy for monthly streamflow predictions, especially for the 1- and 3-month lead times. Combining GCM forecasts with either historical meteorological data or historical streamflow observations and meteorological data as predictors generally provides the best predictive performance. In addition, using large-scale climate indexes as ancillary information is able to improve the predictive performance at a lead time of 6 months. For lead times of 1, 3, and 6 months, two evaluation metrics (i.e., Kling- Gupta efficiency (KGE) and mean relative error (MRE)) calculated based on the best-performing predictor combinations are satisfactory for hydrological stations in both mainstream and tributaries, with the median KGE being higher than 0.85 and 0.62, and the median MRE being approximately 20% and 40%, respectively, suggesting the monthly streamflow predictions are better for mainstream than for tributaries. Overall results show that (1) the inclusion of predicted information from GCMs can improve the performance for monthly streamflow prediction, and (2) the way of combining various heterogeneous data sources is crucial.