4.4 Article

A Survey on Blocking Technology of Entity Resolution

期刊

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
卷 35, 期 4, 页码 769-793

出版社

SCIENCE PRESS
DOI: 10.1007/s11390-020-0350-4

关键词

blocking construction; blocking optimization; data linkage; entity resolution

资金

  1. National Natural Science Foundation of China [61772268]
  2. Fundamental Research Funds for the Central Universities of China [NS2018057, NJ2018014]

向作者/读者索取更多资源

Entity resolution (ER) is a significant task in data integration, which aims to detect all entity profiles that correspond to the same real-world entity. Due to its inherently quadratic complexity, blocking was proposed to ameliorate ER, and it offers an approximate solution which clusters similar entity profiles into blocks so that it suffices to perform pairwise comparisons inside each block in order to reduce the computational cost of ER. This paper presents a comprehensive survey on existing blocking technologies. We summarize and analyze all classic blocking methods with emphasis on different blocking construction and optimization techniques. We find that traditional blocking ER methods which depend on the fixed schema may not work in the context of highly heterogeneous information spaces. How to use schema information flexibly is of great significance to efficiently process data with the new features of this era. Machine learning is an important tool for ER, but end-to-end and efficient machine learning methods still need to be explored. We also sum up and provide the most promising trend for future work from the directions of real-time blocking ER, incremental blocking ER, deep learning with ER, etc.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据