☆ 4.7 Article

On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization*

ENVIRONMENTAL MODELLING & SOFTWARE (2023)

期刊

ENVIRONMENTAL MODELLING & SOFTWARE

卷 167, 期 -, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.envsoft.2023.105779

关键词

Model development; Model evaluation; Data partitioning; Data splitting; Calibration; Validation; Uncertainty; Earth systems

类别

Computer Science, Interdisciplinary Applications Engineering, Environmental Environmental Sciences Water Resources

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Models are crucial in advancing our understanding of Earth's physical nature and environmental systems, but their accuracy and reliability depend heavily on data, which are often partitioned without justification. This study highlights the significance of meticulously considering data partitioning in the model development and evaluation process, and its impact on model generalization. Flaws in existing data-splitting approaches are identified, and a forward-looking strategy is proposed to address this issue, leading to improved model generalization capabilities.

Models play a pivotal role in advancing our understanding of Earth's physical nature and environmental systems, aiding in their efficient planning and management. The accuracy and reliability of these models heavily rely on data, which are generally partitioned into subsets for model development and evaluation. Surprisingly, how this partitioning is done is often not justified, even though it determines what model we end up with, how we assess its performance and what decisions we make based on the resulting model outputs. In this study, we shed light on the paramount importance of meticulously considering data partitioning in the model development and evaluation process, and its significant impact on model generalization. We identify flaws in existing data-splitting approaches and propose a forward-looking strategy to effectively confront the elephant in the room, leading to improved model generalization capabilities.

On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization*

期刊

ENVIRONMENTAL MODELLING & SOFTWARE

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization*

期刊

ENVIRONMENTAL MODELLING & SOFTWARE

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文