4.7 Article

Possibilistic Data Cleaning

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2021.3062318

关键词

Cleaning; Uncertainty; Maintenance engineering; Possibility theory; Data models; Semantics; Relational databases; Algorithm; constraint; data cleaning; database; fixed-parameter tractable; intractability; possibility theory; vertex cover

向作者/读者索取更多资源

Classical data cleaning methods often ignore the uncertainty of the data and constraints. We propose a non-invasive qualitative approach to uncertainty, which improves the effectiveness of data cleaning.
Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples and constraints is ignored. We propose not to view data as dirty but the uncertainty information about data. Since probabilities are often unavailable and their treatment is limited due to correlations in the data, we investigate a qualitative approach to uncertainty. Tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Our approach is non-invasive to the data as we lower the possibility degree of tuples as little as possible. The new resulting qualitative version of vertex cover remains NP-hard. We establish an algorithm that is fixed-parameter tractable in the size of the qualitative vertex cover. Experiments with synthetic and real-world data show that our algorithm outperforms the classical algorithm proportionally to the available number of uncertainty degrees. By mining the certainty degrees with which constraints hold, our framework becomes applicable even when uncertainty information is unavailable.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据