☆ 4.7 Article

Efficient Utilization of Missing Data in Cost-Sensitive Learning

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

卷 33, 期 6, 页码 2425-2436

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2019.2956530

关键词

Data models; Analytical models; Machine learning; Decision trees; Machine learning algorithms; Knowledge discovery; Computer science; Missing data imputation; cost-sensitive learning; decision tree; classification; imputation order; C4; 5 algorithm; imputation cost

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic

资金

China Key Research Program [2016YFB1000905]
Natural Science Foundation of China [61836016, 61876046, 61573270, 61672177]
Project of Guangxi Science and Technology [GuiKeAD17195062]
Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing
Program of Introducing 100 High-Level Overseas Talents
Research Fund of Guangxi Key Lab of Multisource Information Mining and Security [18-A-01-01]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study introduces a novel Data-driven Incremental Imputation Model (DIM) to effectively and economically impute missing values using all available information in the dataset. By considering both economical criteria and effective imputation information, DIM outperforms comparison methods in terms of prediction accuracy and classification accuracy on UCI datasets.

Different from previous imputation methods which impute missing values in the incomplete samples by using the information in the complete samples, this paper proposes a Date-drive Incremental imputation Model, DIM for short, which uses all available information in the data set to impute missing values economically, effectively, orderly, and iteratively. To this end, we propose a scoring rule to rank the missing features by taking into account both the economical criterion and the effective imputation information. The economical criterion takes both the imputation cost and the discriminative ability of the feature into account, while the effective imputation information enables to use all observed information in the data set including the imputed missing values to impute the left missing values. During the imputation process, our DIM first detects the neednot-impute samples for reducing the imputation cost and noise, and then selects the missing features with the top rank to impute first. The imputation process orderly imputes the missing features until all missing values are imputed or the imputation cost is exhausted. Experimental results on UCI data sets demonstrated the advantages of our proposed DIM, compared to the comparison methods, in terms of prediction accuracy and classification accuracy.

Efficient Utilization of Missing Data in Cost-Sensitive Learning

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Efficient Utilization of Missing Data in Cost-Sensitive Learning

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文