4.4 Article

Bayesian networks for imputation in classification problems

期刊

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
卷 29, 期 3, 页码 231-252

出版社

SPRINGER
DOI: 10.1007/s10844-006-0016-x

关键词

missing values; Bayesian networks; data mining

向作者/读者索取更多资源

Missing values are an important problem in data mining. In order to tackle this problem in classification tasks, we propose two imputation methods based on Bayesian networks. These methods are evaluated in the context of both prediction and classification tasks. We compare the obtained results with those achieved by classical imputation methods (Expectation-Maximization, Data Augmentation, Decision Trees, and Mean/Mode). Our simulations were performed by means of four datasets (Congressional Voting Records, Mushroom, Wisconsin Breast Cancer and Adult), which are benchmarks for data mining methods. Missing values were simulated in these datasets by means of the elimination of some known values. Thus, it is possible to assess the prediction capability of an imputation method, comparing the original values with the imputed ones. In addition, we propose a methodology to estimate the bias inserted by imputation methods in classification tasks. In this sense, we use four classifiers (One Rule, Naive Bayes, J4.8 Decision Tree and PART) to evaluate the employed imputation methods in classification scenarios. Computing times consumed to perform imputations are also reported. Simulation results in terms of prediction, classification, and computing times allow us performing several analyses, leading to interesting conclusions. Bayesian networks have shown to be competitive with classical imputation methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据