4.7 Article

Predicting Daily River Chlorophyll Concentrations at a Continental Scale

Journal

WATER RESOURCES RESEARCH
Volume 59, Issue 11, Pages -

Publisher

AMER GEOPHYSICAL UNION
DOI: 10.1029/2022WR034215

Keywords

chlorophyll; rivers; machine learning

Ask authors/readers for more resources

Eutrophication is a major threat to aquatic ecosystems, and predicting chlorophyll a concentrations can help assess the trophic state and algal abundance. In this study, a large dataset of chlorophyll a concentrations from 82 streams and rivers across the United States was compiled, and a machine learning algorithm was used to predict daily chlorophyll a concentrations. The model showed strong correlations with observed data, but had lower accuracy when applied to completely new sites. Turbidity and total nitrogen were identified as the most important variables for predicting chlorophyll a.
Eutrophication is one of the largest threats to aquatic ecosystems and chlorophyll a measurements are relevant indicators of trophic state and algal abundance. Many studies have modeled chlorophyll a in rivers but model development and testing has largely occurred at individual sites which hampers creating generalized models capable of making broad-scale predictions. To address this gap, we compiled a large data set of chlorophyll a concentrations matched to other water quality, meteorological, and reach characteristic data for a diverse set of 82 streams and rivers across the United States. We used this data set and extreme gradient boosting, a tree-based machine learning algorithm, to predict daily chlorophyll a concentrations. Furthermore, we tested several practical considerations of broad-scale models, such as making predictions at sites not included in model training or the utility of in situ water quality data versus universally available remotely estimated model inputs. Predictions were very strongly correlated to observations when compared against a randomly withheld subset of days; however, the model had lower accuracy when applied to completely novel sites withheld from model training. Turbidity and total nitrogen were the two most important variables for predicting chlorophyll a. Although in situ variables improved modeled estimates and were identified as more important during model interpretation, using only remote inputs still resulted in highly correlated predictions with small bias. Testing a model across many sites allowed for identification of common variables relevant to chlorophyll a and highlighted several challenges for applying data-driven models to new sites or at larger spatial scales.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available