☆ 4.7 Article

The Data Synergy Effects of Time-Series Deep Learning Models in Hydrology

WATER RESOURCES RESEARCH (2022)

期刊

WATER RESOURCES RESEARCH

卷 58, 期 4, 页码 -

出版社

AMER GEOPHYSICAL UNION

DOI: 10.1029/2021WR029583

关键词

类别

Environmental Sciences Limnology Water Resources

资金

Biological and Environmental Research program from the U.S. Department of Energy [DE-SC0016605]
Google AI Impacts Challenge Grant [1904-57775]
U.S. Department of Energy (DOE) [DE-SC0016605] Funding Source: U.S. Department of Energy (DOE)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Traditionally, statistical models in geoscientific disciplines like hydrology are built separately for different regions. However, in the era of big data and deep learning, it is often more beneficial to compile a large and heterogeneous dataset and compare it to a local model. This is because deep learning models can accommodate different training instances and learn both similarities and differences between regions, resulting in improved performance. This phenomenon is called the "data synergy effect" and has implications for climate change impact assessment.

When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to stratify a large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, each stratified model has access to fewer and less diverse data points. Here, through two hydrologic examples (soil moisture and streamflow), we show that conventional wisdom may no longer hold in the era of big data and deep learning (DL). We systematically examined an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. The performance of the DL models benefited from modest diversity in the training data compared to a homogeneous training set, even with similar data quantity. Moreover, allowing heterogeneous training data makes eligible much larger training datasets, which is an inherent advantage of DL. A large, diverse data set is advantageous in terms of representing extreme events and future scenarios, which has strong implications for climate change impact assessment. The results here suggest the research community should place greater emphasis on data sharing. Plain Language Summary Traditionally with statistical methods used in hydrology, we split the domain into relatively homogeneous regimes, for each of which we can create a simple model, that is, a local model. However, in the era of big data machine learning, we show that this is often the opposite of what should be done. With deep learning models, we should compile a large and heterogeneous data set and compare the local model to a model trained with all the data (global model). Including heterogeneous training samples may improve the results compared to the local model. We call this the data synergy effect, and it results from two main factors. First, deep learning models are complex enough to accommodate different training instances, inherently permitting larger training datasets with more extreme events and changing trends. Second, with a heterogeneous training data set, deep learning models may be able to learn both the underlying similarities and factors contributing to differences between regions.

The Data Synergy Effects of Time-Series Deep Learning Models in Hydrology

期刊

WATER RESOURCES RESEARCH

出版社

AMER GEOPHYSICAL UNION

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The Data Synergy Effects of Time-Series Deep Learning Models in Hydrology

期刊

WATER RESOURCES RESEARCH

出版社

AMER GEOPHYSICAL UNION

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文