4.6 Article

Explaining Predictive Model Performance: An Experimental Study of Data Preparation and Model Choice

期刊

BIG DATA
卷 11, 期 3, 页码 199-214

出版社

MARY ANN LIEBERT, INC
DOI: 10.1089/big.2021.0067

关键词

artificial intelligence; data mining; data science; design of experiments; scientific method; United Network for Organ Sharing (UNOS)

向作者/读者索取更多资源

This article evaluates the impact of data preparation and model selection on the predictive accuracy of models applied to a heart transplantation database. It highlights the interactions between early and later decisions and emphasizes the need for improved rigor in applied predictive research.
Although confirmatory modeling has dominated much of applied research in medical, business, and behavioral sciences, modeling large data sets with the goal of accurate prediction has become more widely accepted. The current practice for fitting predictive models is guided by heuristic-based modeling frameworks that lead researchers to make a series of often isolated decisions regarding data preparation and cleaning that may result in substandard predictive performance. In this article, we use an experimental design to evaluate the impact of six factors related to data preparation and model selection (techniques for numerical imputation, categorical imputation, encoding, subsampling for unbalanced data, feature selection, and machine learning algorithm) and their interactions on the predictive accuracy of models applied to a large, publicly available heart transplantation database. Our factorial experiment includes 10,800 models evaluated on 5 independent test partitions of the data. Results confirm that some decisions made early in the modeling process interact with later decisions to affect predictive performance; therefore, the current practice of making these decisions independently can negatively affect predictive outcomes. A key result of this case study is to highlight the need for improved rigor in applied predictive research. By using the scientific method to inform predictive modeling, we can work toward a framework for applied predictive modeling and a standard for reproducibility in predictive research.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据