4.8 Article

Simultaneous feature engineering and interpretation: Forecasting harmful algal blooms using a deep learning approach

Journal

WATER RESEARCH
Volume 215, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.watres.2022.118289

Keywords

Reverse time attention mechanism; Decay mechanism; Recurrent neural network; Explainable artificial intelligence; Harmful algal bloom; Cyanobacteria

Funding

  1. National Research Foundation of Korea (NRF) - Korea government (MSIT) [2020R1A2C1009961]
  2. National Institute of Environmental Research (NIER) - Ministry of Environment (ME) of the Republic of Korea [NIER-2020-04-02-003]
  3. Korea Environmental Industry & Technology Institute (KEITI) through Aquatic Ecosystem Conservation Research Program - Korea Ministry of Environment (MOE) [202000305003]
  4. National Research Foundation of Korea [2020R1A2C1009961] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

The article introduces a model called RETAIN-D for forecasting harmful algal blooms (HABs), which combines reverse time attention with a decay mechanism. The model improves temporal resolution, forecasting performance, and interpretability. The study shows that RETAIN-D outperforms other models in forecasting HABs and successfully captures the high variability and irregularities in the time series.
Routine monitoring for harmful algal blooms (HABs) is generally undertaken at low temporal frequency (e.g., weekly to monthly) that is unsuitable for capturing highly dynamic variations in cyanobacteria abundance. Therefore, we developed a model incorporating reverse time attention with a decay mechanism (RETAIN-D) to forecast HABs with simultaneous improvements in temporal resolution, forecasting performance, and interpretability. The usefulness of RETAIN-D in forecasting HABs was illustrated by its application to two sites located in the lower sections of the Nakdong and Yeongsan rivers, South Korea, where HABs pose a critical water quality issue. Three variations of recurrent neural network models, i.e., long short-term memory (LSTM), gated recurrent unit (GRU), and reverse time attention (RETAIN), were adopted for comparisons of performance with RETAIN-D. Input features encompassing meteorological, hydrological, environmental, and biological factors were used to forecast cyanobacteria abundance (total cyanobacteria cell counts and cell counts of dominant cyanobacteria taxa). Incorporation of a decay mechanism into the deep learning structure in RETAIN-D allowed forecasts of HABs on a high temporal resolution (daily) without manual feature engineering, increasing the usefulness of resulting forecasts for water quality and resources management. RETAIN-D yielded a high degree of accuracy (RMSE = 0.29-1.67, R-2 = 0.76-0.98, MAE = 0.18-1.14, SMAPE = 9.77-87.94% for test sets; on natural log scales) across model outputs and sites, successfully capturing high variability and irregularities in the time series. RETAIN-D showed higher accuracy than RETAIN (except for comparable accuracy in forecasting Microcystis abundance at the Nakdong River site) and outperformed LSTM and GRU across all model outputs and sites. Ambient temperature had high importance in forecasting cyanobacteria abundance across all model outputs and sites, whereas the relative importance of other input features varied by the output and site. Increases in contributions with increasing irradiance, decreasing flow rates, and increasing residence time were more pronounced in summer than other seasons. Differences in the contributions of input features among different time steps (1 to 7 days prior to forecasting) were larger in the Yeongsan River site. RETAIN-D is applicable to a wide range of forecasting models that can benefit from improved temporal resolution, performance, and interpretability.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available