3.8 Proceedings Paper

Towards Practical Open Knowledge Base Canonicalization

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3269206.3271707

Keywords

Entity Resolution; Open Knowledge Base; Canonicalization

Funding

  1. Hong Kong Research Grant Council GRF [17254016, 17253616]

Ask authors/readers for more resources

An Open Information Extraction (OIE) system processes textual data to extract assertions, which are structured data typically represented in the form of. < subject; relation; object >. triples. An Open Knowledge Base (OKB) is a collection of such assertions. We study the problem of canonicalizing an OKB, which is defined as the problem of mapping each name (a textual term such as the rockies, colorado rockies) to a canonical form (such as rockies). Galarraga et al. [18] proposed a hierarchical agglomerative clustering algorithm using canopy clustering to tackle the canonicalization problem. The algorithm was shown to be very effective. However, it is not efficient enough to practically handle large OKBs due to the large number of similarity score computations. We propose the FAC algorithm for solving the canonicalization problem. FAC employs pruning techniques to avoid unnecessary similarity computations, and bounding techniques to efficiently approximate and identify small similarities. In our experiments, FAC registers orders-of-magnitude speedups over other approaches.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available