Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Environmental Microbial & Food Safety Laboratory » Research » Publications at this Location » Publication #383294

Research Project: Design and Implementation of Monitoring and Modeling Methods to Evaluate Microbial Quality of Surface Water Sources Used for Irrigation

Location: Environmental Microbial & Food Safety Laboratory

Title: In-stream Escherichia Coli modeling using high-temporal-resolution data with deep learning and process-based models

Author
item ABBAS, ATHER - ULSAN NATIONAL INSTITUTE OF SCIENCE AND TECHNOLOGY (UNIST)
item BAEK, SANGSOO - ULSAN NATIONAL INSTITUTE OF SCIENCE AND TECHNOLOGY (UNIST)
item SILVERA, NORBERT - THE SORBONNE UNIVERSITY
item SOULILEUTH, BOUNSAMAY - NATIONAL AGRICULTURE AND FORESTRY RESEARCH INSTITUTE (NAFRI)
item Pachepsky, Yakov
item RIBOLZI, OLIVER - UNIVERSITY OF TOULOUSE
item BOITHIAS, LAURIE - UNIVERSITY OF TOULOUSE
item CHO, KYUNG HWA - ULSAN NATIONAL INSTITUTE OF SCIENCE AND TECHNOLOGY (UNIST)

Submitted to: Hydrology and Earth System Sciences
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/15/2021
Publication Date: 12/6/2021
Citation: Abbas, A., Baek, S., Silvera, N., Soulileuth, B., Pachepsky, Y.A., Ribolzi, O., Boithias, L., Cho, K. 2021. In-stream Escherichia Coli modeling using high-temporal-resolution data with deep learning and process-based models. Hydrology and Earth System Sciences. https://doi.org/10.5194/hess-25-6185-2021.
DOI: https://doi.org/10.5194/hess-25-6185-2021

Interpretive Summary: Microbial quality of surface water is of major public concern, and E. coli concentrations are the common indicators of potential pathogen presence. Process-based models for a long time were the primary tool for watershed sale forecasts and management recommendations. Statistical and machine learning models were much less in use because they require large volumes of data for development and testing. Recent changes in monitoring technology led to the availability of high volumes of high-frequency monitoring data that were sufficient for the application of powerful machine deep-learning methods. Our goal was to compare the performance of the process-based HSPF model and the data-driven LSTM model using the high-frequency hydrologic data collected at the stream in a small mountain catchment. The deep-learning LSTM models substantially outperformed the process-based HSPF model. Results of this work demonstrate the value of high-frequency hydrologic data. They indicate directions for designing and implementing modern hydrologic monitoring for microbial stream water quality assessment and management.

Technical Abstract: Contamination of surface waters through microbiological pollutants is a major concern to public health. Although long-term and high-frequency E. coli monitoring can help prevent diseases from fecal pathogenic microorganisms, this monitoring is time consuming and expensive. Process-driven models are an alternative method for determining fecal pathogenic microorganisms. However, process-based modeling still has limitations in improving the model accuracy because of the complex mechanistic relationships among hydrological and environmental variables. On the other hand, with the rise in data availability and computation power, the use of data-driven models is increasing. Therefore, in this study, we simulated the transport of Escherichia coli (E. coli) in a 0.6 km² tropical headwater catchment located in Lao PDR using a deep learning model and a process-based model. The deep learning model was built using the long short-term memory (LSTM) technique, whereas the process-based model was constructed using the Hydrological Simulation Program–FORTRAN (HSPF). First, we calibrated both models for surface as well as for subsurface flow. Then, we simulated the E. coli transport with 6 min time steps with both the HSPF and LSTM models. The LSTM provided accurate results for surface and subsurface flow, by showing 0.51 and 0.64 of Nash–Sutcliffe Efficiency (NSE), respectively, whereas the NSE values yielded by the HSPF were -0.7 and 0.59 for surface and subsurface flow. The simulated E. coli concentration from LSTM also improved, yielding an NSE of 0.35, whereas the HSPF showed an unacceptable performance, with an NSE value of -3.01. This is because of the limitation of HSPF in capturing the dynamics of E. coli with land-use change. The simulated E. coli concentration showed rise and drop patterns corresponding to annual changes in land use. This study shows the application of deep learning-based models as an efficient alternative to process-based models for E. coil fate and transport simulation at the catchment scale.ents.