4.5 Article

Location Extraction from Social Media: Geoparsing, Location Disambiguation, and Geotagging

Journal

ACM TRANSACTIONS ON INFORMATION SYSTEMS
Volume 36, Issue 4, Pages -

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3202662

Keywords

Algorithms; Measurement; Performance; Design; Experimentation; Location extraction; toponym extraction; information extraction; geoparsing; geocoding; geotagging; location; toponym; disambiguation; social media; benchmark

Funding

  1. 7th Framework Program of the European Commission [610928, 258723]
  2. Horizon 2020 Program of the European Commission [687786]

Ask authors/readers for more resources

Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This article evaluates five best-of-class location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third-party work evaluated includes a DBpedia-based entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach, and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flicks posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1 km 0.49) for the geotagging evaluation. The map database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses, and a detailed failure analysis for the approaches and suggest concrete areas for further research.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available