Location: Water Management and Systems Research
Title: Investigating the influence of measurement uncertainty on chlorophyll-a predictions as an indicator of harmful algal blooms in machine learning modelsAuthor
![]() |
BUSARI, IBRAHIM - Clemson University |
![]() |
SAHOO, DEBABRATA - Clemson University |
![]() |
SUDHEER, K - Indian Institute Of Technology |
![]() |
Harmel, Robert |
![]() |
PRIVETTE, C - Clemson University |
![]() |
SCHLAUTMAN, M - Clemson University |
![]() |
SAWYER, C - Clemson University |
Submitted to: Ecological Informatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 7/19/2024 Publication Date: 9/1/2024 Citation: Busari, I., Sahoo, D., Sudheer, K.P., Harmel, R.D., Privette, C., Schlautman, M., Sawyer, C. 2024. Investigating the influence of measurement uncertainty on chlorophyll-a predictions as an indicator of harmful algal blooms in machine learning models. Ecological Informatics. 82. Article e102735. https://doi.org/10.1016/j.ecoinf.2024.102735. DOI: https://doi.org/10.1016/j.ecoinf.2024.102735 Interpretive Summary: Advancements in data availability, including high frequency, near real-time multiparameter sensors, laboratory analysis, and in-situ and remote observations, have driven the development of machine learning (ML) models for applications such as toxic Harmful Algal Bloom (HABs) monitoring. However, the performance of these models is influenced by uncertainties due to the incomplete model representation of the real world and due to uncertainty associated with input dataset measurements. For example, measurement uncertainty arises from sample collection, sensor drift. and laboratory analysis as well as data transcription and sample handling errors. While impacts of model uncertainty are commonly addressed, the effect of measurement uncertainty is less studied due to the limited availability of detailed measurement information. This study focuses on assessing the impact of measurement uncertainty on the ML prediction of chlorophyll-a concentration as an index of HABs. Our findings showed that the model predictions have mean absolute errors that ranged from 0.16-5.19 µg/l and root mean squared errors that ranged from 0.20-7.39 µg/l. The results of this study demonstrate how well ML models can capture various HABs patterns when given diverse measurement variables. Our findings will give researchers insightful information on how to lessen the impact of measurement uncertainty when using ML models as decision-support tools for HABs management. Technical Abstract: Advancements in data availability, including high frequency, near real-time multiparameter sensors, laboratory analysis, and in-situ and remote observations, have driven the development of machine learning (ML) models for applications such as toxic Harmful Algal Bloom (HABs) monitoring. However, the performance of ML predictions is influenced by both model uncertainties due to inherent model structures and errors associated with input dataset measurements. For example, measurement uncertainty arises from sample collection, sensor drift and laboratory analysis and sample handling errors. While impacts of model uncertainty are commonly addressed using probabilistic approaches, the effect of measurement uncertainty is less studied due to the limited availability of detailed measurement information. This study focuses on assessing the impact of measurement uncertainty on the ML prediction of chlorophyll-a concentration as an index of HABs in a mesotrophic lake. Using randomized subsets of input measured datasets that mimic possible chlorophyll-a concentration distributions, the study built 1000 Random Forest (RF) and Support Vector Regression (SVR) models. An independent measured dataset was used to validate the ensemble models, allowing for model performance evaluation and the creation of prediction intervals to measure the propagated uncertainty. Our findings showed that the model predictions have MAE that ranged between 0.16 µg/l and 5.19 µg/l, and RMSE ranging between 0.20 µg/l and 7.39 µg/l. The highest uncertainty coverage of 0.71 was observed in the RF model without chlorophyll-a sensor values as a predictor. The study found that the training dataset sizes due to the high frequency and manually sampled nature influence how much measurement uncertainty is covered. The results of this study demonstrate how well ML models can capture various HABs patterns when given diverse measurement variables. Our findings will give researchers insightful information on how to lessen the impact of measurement uncertainty when using ML models as decision-support tools for HABs management. |