3.8 Article

Incremental Discovery of Imprecise Functional Dependencies

期刊

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3397462

关键词

Functional dependency; discovery algorithm; tuple insertions; incremental discovery; parallelism

向作者/读者索取更多资源

Functional dependencies (FDS) are one of the metadata used to assess data quality and to perform data cleaning operations. However, to pursue robustness with respect to data errors, it has been necessary to devise imprecise versions of functional dependencies, yielding relaxed functional dependencies (RFDS). Among them, there exists the class of RFDS relaxing on the extent, i.e., those admitting the possibility that an FD holds on a subset of data. In the literature, several algorithms to automatically discover RFDS from big data collections have been defined. They achieve good performances with respect to the inherent problem complexity. However, most of them are capable of discovering RFDS only by batch processing the entire dataset. This is not suitable in the era of big data, where the size of a database instance can grow with high-velocity, and the insertion of new data can invalidate previously holding RFD& Thus, it is necessary to devise incremental discovery algorithms capable of updating the set of holding RFDS upon data insertions, without processing the entire dataset. To this end, in this article we propose an incremental discovery algorithm for RFDS relaxing on the extent. It manages the validation of candidate RFDS and the generation of possibly new RFD candidates upon the insertion of the new tuples, while limiting the size of the overall search space. Experimental results show that the proposed algorithm achieves extremely good performances on real-world datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据