4.5 Article

Following the dynamic block on the Web

Journal

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS
Volume 19, Issue 6, Pages 1077-1101

Publisher

SPRINGER
DOI: 10.1007/s11280-015-0374-9

Keywords

Dynamic block; Wrapper; Block tracking

Funding

  1. National Basic Research Program of China (973 Program) [2014CB340403]
  2. Fundamental Research Funds for the Central Universities
  3. Research Funds of Renmin University of China [14XNLF05, 15XNLF03]
  4. National Culture Science and Technology Promotion Plan
  5. National Natural Science Foundation of China [61502501]
  6. secondary network prototype system development project by Xinhua News Agency

Ask authors/readers for more resources

With the rapid changes in dynamic web pages, there is an increasing need for receiving instant updates for dynamic blocks on the Web. In this paper, we address the problem of automatically following dynamic blocks in web pages. Given a user-specified block on a web page, we continuously track the content of the block and report the updates in real time. This service can bring obvious benefits to users, such as the ability to track top-ten breaking news on CNN, the prices of iPhones on Amazon, or NBA game scores. We study 3,346 human labeled blocks from 1,127 pages, and analyze the effectiveness of four types of patterns, namely visual area, DOM tree path, inner content and close context, for tracking content blocks. Because of frequent web page changes, we find that the initial patterns generated on the original page could be invalidated over time, leading to the failure of extracting correct blocks. According to our observations, we combine different patterns to improve the accuracy and stability of block extractions. Moreover, we propose an adaptive model that adapts each pattern individually and adjusts pattern weights for an improved combination. The experimental results show that the proposed models outperform existing approaches, with the adaptive model performing the best.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available