☆ 4.4 Article

Synthesizing Entity Matching Rules by Examples

PROCEEDINGS OF THE VLDB ENDOWMENT (2017)

Journal

PROCEEDINGS OF THE VLDB ENDOWMENT

Volume 11, Issue 2, Pages 189-202

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.14778/3149193.3149199

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Entity matching (EM) is a critical part of data integration. We study how to synthesize entity matching rules from positive-negative matching examples. The core of our solution is program synthesis, a powerful tool to automatically generate rules (or programs) that satisfy a given highlevel specification, via a predefined grammar. This grammar describes a General Boolean Formula (GBF) that can include arbitrary attribute matching predicates combined by conjunctions (boolean AND), disjunction (boolean OR) and negations (-), and is expressive enough to model EM problems, from capturing arbitrary attribute combinations to handling missing attribute values. The rules in the form of GBF are more concise than traditional EM rules represented in Disjunctive Normal Form (DNF). Consequently, they are more interpretable than decision trees and other machine learning algorithms that output deep trees with many branches. We present a new synthesis algorithm that, given only positive negative examples as input, synthesizes EM rules that are effective over the entire dataset. Extensive experiments show that we outperform other interpretable rules (e.g., decision trees with low depth) in effectiveness, and are comparable with non-interpretable tools (e.g., decision trees with high depth, gradient-boosting trees, random forests and SVM).

Synthesizing Entity Matching Rules by Examples

Journal

PROCEEDINGS OF THE VLDB ENDOWMENT

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Synthesizing Entity Matching Rules by Examples

Journal

PROCEEDINGS OF THE VLDB ENDOWMENT

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper