4.7 Article

Effective Bayesian-network-based missing value imputation enhanced by crowdsourcing

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 190, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2019.105199

Keywords

Missing values; Bayesian network; Crowdsourcing

Funding

  1. NSFC, PR China [U1509216, U1866602, 61602129]
  2. Microsoft Research Asia, PR China

Ask authors/readers for more resources

During the process of data collection, incompleteness is one of the most serious data quality problems to deal with. Traditional imputation methods mostly rely on statistics and machine learning techniques. However, both types of methods are limited in their accuracy due to lacking enough information about the missing data. To obtain more information, recent methods resort to external sources such as knowledge bases or the worldwide web. Unfortunately, such methods may still be less helpful, since there may exist little information about the missing values in the knowledge bases, or too much noise on the web. To tackle these issues, this paper adopts crowdsourcing as the external source, where hundreds of thousands of ordinary workers on the platform can provide high-quality information based on contextual knowledge and human cognitive ability. To reduce the cost, a joint model is proposed for imputation, which integrates crowdsourcing into the process of Bayesian inference. We first construct a Bayesian network for the attributes in the dataset, then the missing attribute values are inferred by Bayesian inference. To improve the accuracy of the Bayesian inference, we outsource a small number of informative tasks to the crowd workers, where the informative tasks are selected based on uncertainty and influence. The proposed approach is evaluated with extensive experiments using real-world datasets with a simulated crowd and two real crowdsourcing platforms. The experimental results show that our approach achieves a better performance compared to other imputation approaches. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available