4.6 Article

A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names

Journal

SUSTAINABILITY
Volume 6, Issue 10, Pages 6529-6552

Publisher

MDPI
DOI: 10.3390/su6106529

Keywords

focused crawler; place name; web information collection; borderlands situation; relevance calculation; spatial relations

Funding

  1. National Science Foundation of China [41301412, 41231172]
  2. Ministry of Science and Technology of China [2012BAK12B00]

Ask authors/readers for more resources

Place name is an important ingredient of borderlands situation information and plays a significant role in collecting them from the Internet with focused crawlers. However, current focused crawlers treat place name in the same way as any other common keyword, which has no geographical properties. This may reduce the effectiveness of focused crawlers. To solve the problem, this paper firstly discusses the importance of place name in focused crawlers in terms of location and spatial relation, and, then, proposes the two-tuple-based topic representation method to express place name and common keyword, respectively. Afterwards, spatial relations between place names are introduced to calculate the relevance of given topics and webpages, which can make the calculation process more accurately. On the basis of the above, a focused crawler prototype for borderlands situation information collection is designed and implemented. The crawling speed and F-Score are adopted to evaluate its efficiency and effectiveness. Experimental results indicate that the efficiency of our proposed focused crawler is consistent with the polite access interval and it could meet the daily demand of borderlands situation information collection. Additionally, the F-Score value of our proposed focused crawler increases by around 7%, which means that our proposed focused crawler is more effective than the traditional best-first focused crawler.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available