☆ 4.6 Article

Toponym matching through deep neural networks

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE (2018)

Journal

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Volume 32, Issue 2, Pages 324-348

Publisher

TAYLOR & FRANCIS LTD

DOI: 10.1080/13658816.2017.1390119

Keywords

Toponym matching; duplicate detection; approximate string matching; deep neural networks; recurrent neural networks; geographic information retrieval

Funding

Trans-Atlantic Platform for the Social Sciences and Humanities, through the Digging into Data project [HJ-253525]
Trans-Atlantic Platform for the Social Sciences and Humanities, through Reassembling the Republic of Letters networking program (EU COST Action) [IS1310]
Fundacao para a Ciencia e Tecnologia (FCT) [PTDC/EEI-SCR/1743/2014, CMUP-ERI/TIC/0046/2014]
Fundacao para a Ciencia e Tecnologia (FCT) through the INESC-ID multi-annual funding from the PIDDAC program [UID/CEC/50021/2013]
ESRC [ES/R003890/1] Funding Source: UKRI
Economic and Social Research Council [ES/R003890/1] Funding Source: researchfish
Fundação para a Ciência e a Tecnologia [CMUP-ERI/TIC/0046/2014, PTDC/EEI-SCR/1743/2014] Funding Source: FCT

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods that combine multiple metrics. However, these methods all rely on common sub-strings in order to establish similarity, and they do not effectively capture the character replacements involved in toponym changes due to transliterations or to changes in language and culture over time. In this article, we present a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching. The proposed network architecture uses recurrent nodes to build representations from the sequences of bytes that correspond to the strings that are to be matched. These representations are then combined and passed to feed-forward nodes, finally leading to a classification decision. We present the results of a wide-ranging evaluation on the performance of the proposed method, using a large dataset collected from the GeoNames gazetteer. These results show that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics.

Toponym matching through deep neural networks

Journal

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Publisher

TAYLOR & FRANCIS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Toponym matching through deep neural networks

Journal

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Publisher

TAYLOR & FRANCIS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper