4.7 Article

Efficient Utilization of Missing Data in Cost-Sensitive Learning

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2019.2956530

关键词

Data models; Analytical models; Machine learning; Decision trees; Machine learning algorithms; Knowledge discovery; Computer science; Missing data imputation; cost-sensitive learning; decision tree; classification; imputation order; C4; 5 algorithm; imputation cost

资金

  1. China Key Research Program [2016YFB1000905]
  2. Natural Science Foundation of China [61836016, 61876046, 61573270, 61672177]
  3. Project of Guangxi Science and Technology [GuiKeAD17195062]
  4. Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing
  5. Program of Introducing 100 High-Level Overseas Talents
  6. Research Fund of Guangxi Key Lab of Multisource Information Mining and Security [18-A-01-01]

向作者/读者索取更多资源

The study introduces a novel Data-driven Incremental Imputation Model (DIM) to effectively and economically impute missing values using all available information in the dataset. By considering both economical criteria and effective imputation information, DIM outperforms comparison methods in terms of prediction accuracy and classification accuracy on UCI datasets.
Different from previous imputation methods which impute missing values in the incomplete samples by using the information in the complete samples, this paper proposes a Date-drive Incremental imputation Model, DIM for short, which uses all available information in the data set to impute missing values economically, effectively, orderly, and iteratively. To this end, we propose a scoring rule to rank the missing features by taking into account both the economical criterion and the effective imputation information. The economical criterion takes both the imputation cost and the discriminative ability of the feature into account, while the effective imputation information enables to use all observed information in the data set including the imputed missing values to impute the left missing values. During the imputation process, our DIM first detects the neednot-impute samples for reducing the imputation cost and noise, and then selects the missing features with the top rank to impute first. The imputation process orderly imputes the missing features until all missing values are imputed or the imputation cost is exhausted. Experimental results on UCI data sets demonstrated the advantages of our proposed DIM, compared to the comparison methods, in terms of prediction accuracy and classification accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据