4.5 Article

The address connector: noninvasive synchronization of hierarchical data sources

Journal

KNOWLEDGE AND INFORMATION SYSTEMS
Volume 37, Issue 3, Pages 639-663

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s10115-012-0582-x

Keywords

Data quality; Record linkage; Entity resolution; Hierarchical data; Trees; Approximate matching; Similarity query; Residential addresses

Funding

  1. SyRA (Synchronizing Residential Addresses) project of the Free University of Bozen-Bolzano, Italy

Ask authors/readers for more resources

Different databases often store information about the same or related objects in the real world. To enable collaboration between these databases, data items that refer to the same object must be identified. Residential addresses are data of particular interest as they often provide the only link between related pieces of information in different databases. Unfortunately, residential addresses that describe the same location might vary considerably and hence need to be synchronized. Non-matching street names and addresses stored at different levels of granularity make address synchronization a challenging task. Common approaches assume an authoritative reference set and correct residential addresses according to the reference set. Often, however, no reference set is available, and correcting addresses with different granularity is not possible. We present the address connector, which links residential addresses that refer to the same location. Instead of correcting addresses according to an authoritative reference set, the connector defines a lookup function for residential addresses. Given a query address and a target database, the lookup returns all residential addresses in the target database that refer to the same location. The lookup supports addresses that are stored with different granularity. To align the addresses of two matching streets, we use a global greedy address-matching algorithm that guarantees a stable matching. We define the concept of address containment that allows us to correctly link addresses with different granularity. The evaluation of our solution on real-world data from a municipality shows that our solution is both effective and efficient.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available