4.7 Article

Enumerating dissimilar minimum cost perfect and error-correcting bipartite matchings for robust data matching

Journal

INFORMATION SCIENCES
Volume 596, Issue -, Pages 202-221

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.03.017

Keywords

Enumeration with dissimilarity; Data matching; Bipartite matching; Dynamic programming; Graph edit distance

Ask authors/readers for more resources

This paper introduces a method for computing K dissimilar minimum cost bipartite matchings and proves that the problem is NP-hard. It presents heuristics based on greedy dynamic programming and shows that these techniques outperform existing algorithms in terms of dissimilarity of the obtained matchings and improve the upper bounds of state-of-the-art algorithms for graph edit distance computation based on bipartite data matching.
Matchings between objects from two datasets, domains, or ontologies have to be computed in various application scenarios. One often used meta-approach -which we call bipartite data matching-is to leverage domain knowledge for defining costs between the objects that should be matched, and to then use the classical Hungarian algorithm to compute a minimum cost bipartite matching. In this paper, we introduce and study the problem of enumerating K dissimilar minimum cost bipartite matchings. We formalize this problem, prove that it is NP-hard, and present heuristics based on greedy dynamic programming. The presented enumeration techniques are not only interesting in themselves, but also mitigate an often overlooked shortcoming of bipartite data matching, namely, that it is sensitive w. r. t. the storage order of the input data. Extensive experiments show that our enumeration heuristics clearly outperform existing algorithms in terms of dissimilarity of the obtained matchings, that they are effective at rendering bipartite data matching approaches more robust w. r. t. random storage order, and that they significantly improve the upper bounds of state-of-the art algorithms for graph edit distance computation that are based on bipartite data matching.(c) 2022 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available