Publication : USDA ARS

ARS Home » Plains Area » Fort Collins, Colorado » Center for Agricultural Resources Research » Water Management and Systems Research » Research » Publications at this Location » Publication #417910

Research Project: Improving Crop Performance and Precision Irrigation Management in Semi-Arid Regions through Data-Driven Research, AI, and Integrated Models

Location: Water Management and Systems Research

Title: Investigating the influence of measurement uncertainty on chlorophyll-a predictions as an indicator of harmful algal blooms in machine learning models

Author

	BUSARI, IBRAHIM - Clemson University
	SAHOO, DEBABRATA - Clemson University
	SUDHEER, K - Indian Institute Of Technology
	Harmel, Robert
	PRIVETTE, C - Clemson University
	SCHLAUTMAN, M - Clemson University
	SAWYER, C - Clemson University

Submitted to: Ecological Informatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/19/2024
Publication Date: 9/1/2024
Citation: Busari, I., Sahoo, D., Sudheer, K.P., Harmel, R.D., Privette, C., Schlautman, M., Sawyer, C. 2024. Investigating the influence of measurement uncertainty on chlorophyll-a predictions as an indicator of harmful algal blooms in machine learning models. Ecological Informatics. 82. Article e102735. https://doi.org/10.1016/j.ecoinf.2024.102735.
DOI: https://doi.org/10.1016/j.ecoinf.2024.102735

Interpretive Summary: Advancements in data availability, including high frequency, near real-time multiparameter sensors, laboratory analysis, and in-situ and remote observations, have driven the development of machine learning (ML) models for applications such as toxic Harmful Algal Bloom (HABs) monitoring. However, the performance of these models is influenced by uncertainties due to the incomplete model representation of the real world and due to uncertainty associated with input dataset measurements. For example, measurement uncertainty arises from sample collection, sensor drift. and laboratory analysis as well as data transcription and sample handling errors. While impacts of model uncertainty are commonly addressed, the effect of measurement uncertainty is less studied due to the limited availability of detailed measurement information. This study focuses on assessing the impact of measurement uncertainty on the ML prediction of chlorophyll-a concentration as an index of HABs. Our findings showed that the model predictions have mean absolute errors that ranged from 0.16-5.19 µg/l and root mean squared errors that ranged from 0.20-7.39 µg/l. The results of this study demonstrate how well ML models can capture various HABs patterns when given diverse measurement variables. Our findings will give researchers insightful information on how to lessen the impact of measurement uncertainty when using ML models as decision-support tools for HABs management.

Technical Abstract: Advancements in data availability, including high frequency, near real-time multiparameter sensors, laboratory analysis, and in-situ and remote observations, have driven the development of machine learning (ML) models for applications such as toxic Harmful Algal Bloom (HABs) monitoring. However, the performance of ML predictions is influenced by both model uncertainties due to inherent model structures and errors associated with input dataset measurements. For example, measurement uncertainty arises from sample collection, sensor drift and laboratory analysis and sample handling errors. While impacts of model uncertainty are commonly addressed using probabilistic approaches, the effect of measurement uncertainty is less studied due to the limited availability of detailed measurement information. This study focuses on assessing the impact of measurement uncertainty on the ML prediction of chlorophyll-a concentration as an index of HABs in a mesotrophic lake. Using randomized subsets of input measured datasets that mimic possible chlorophyll-a concentration distributions, the study built 1000 Random Forest (RF) and Support Vector Regression (SVR) models. An independent measured dataset was used to validate the ensemble models, allowing for model performance evaluation and the creation of prediction intervals to measure the propagated uncertainty. Our findings showed that the model predictions have MAE that ranged between 0.16 µg/l and 5.19 µg/l, and RMSE ranging between 0.20 µg/l and 7.39 µg/l. The highest uncertainty coverage of 0.71 was observed in the RF model without chlorophyll-a sensor values as a predictor. The study found that the training dataset sizes due to the high frequency and manually sampled nature influence how much measurement uncertainty is covered. The results of this study demonstrate how well ML models can capture various HABs patterns when given diverse measurement variables. Our findings will give researchers insightful information on how to lessen the impact of measurement uncertainty when using ML models as decision-support tools for HABs management.

U.S. DEPARTMENT OF AGRICULTURE

Water Management and Systems Research: Fort Collins, CO