4.6 Article

Construction of a high-precision general geographical location words dataset

Journal

COMPUTER STANDARDS & INTERFACES
Volume 84, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.csi.2022.103692

Keywords

Geographical location word; Point of interest; Toponym; Administrative division

Ask authors/readers for more resources

This paper proposes a framework for constructing a high-precision general Chinese geolocation words dataset, and presents a specific dataset called GeoCN. GeoCN addresses the lack of diverse categories, high accuracy, and robustness in Chinese geolocation word lexicon.
Geographical location words (GLWs) are words associated with geographical locations. GLWs are the significant foundation for text data processing and social network location inference. In this paper, we propose a framework for constructing high-precision general GLWs datasets, and a Chinese GLWs dataset (named GeoCN) is constructed. To some extent, GeoCN solves the problem of lacking a Chinese GLWs lexicon with diverse categories, high accuracy, and robust versatility. GeoCN consists of three parts: a) points of interest (POI) data collected based on the electronic map API, b) administrative division data constructed based on the national information platform, and c) GLWs data expanded and filtered by automated procedures and manual processing. We establish a GLWs glossary for each administrative region and map each GLW to its location. GeoCN covers 34 provincial-level administrative regions, 392 prefecture-level administrative regions, and 3,160 county-level administrative regions in China. The number of GLWs in GeoCN reaches 1,763,476, and the compressed file size is 117 MB.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available