4.7 Article

A transfer Learning-Based LSTM strategy for imputing Large-Scale consecutive missing data and its application in a water quality prediction system

期刊

JOURNAL OF HYDROLOGY
卷 602, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.jhydrol.2021.126573

关键词

Water quality; Transfer learning; LSTM; TrAdaBoost; Large-scale consecutive missing data

资金

  1. Leading Talents of Science and Technology Innovation in Zhejiang Provincial Ten Thousands Plan [2018R52040]
  2. International Science and Technology Cooperation Program of Zhejiang Province for Joint Research in High-tech Industry [2016C54007]

向作者/读者索取更多资源

Water quality monitoring is critical for improving water resource protection and management, yet data missing is a common issue in this field. This paper introduces a novel algorithm named TrAdaBoost-LSTM to address large-scale consecutive missing data problems effectively, resulting in improved imputation accuracy.
In recent years, water quality monitoring has been crucial to improve water resource protection and management. Under the relevant laws and regulations, environmental protection department agencies monitor lakes, streams, rivers, and other types of water bodies to assess water quality conditions. The valid and high-quality data generated from these monitoring activities help water resource managers understand the existing pollution situations, energy consumption problems and pollution control needs. However, there are inevitably many problems with water quality data in the real world due to human mistakes or system failures. One of the most frequently occurring issues is missing data. Although most existing studies have explored classic statistical methods or emerging machine/deep learning methods to fill gaps in data, these methods are not suitable for large-scale consecutive missing data problems. To address this issue, this paper proposes a novel algorithm called TrAdaBoost-LSTM, which integrates state-of-the-art deep learning theory through long short-term memory (LSTM) and instance-based transfer learning through TrAdaBoost. This model inherits the full advantages of the LSTM model and transfer learning technique, namely the powerful ability to capture the long-term dependencies among time series and the flexibility of leveraging the related knowledge from complete datasets to fill in large-scale consecutive missing data. A case study involving Dissolved Oxygen concentrations obtained from water quality monitoring stations is conducted to validate the effectiveness and superiority of the proposed method. The results show that the proposed TrAdaBoost-LSTM model not only improves the imputation accuracy by 15%similar to 25% compared with that of alternative models based on the obtained performance indicators, but also provides potential ideas for similar future research.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据