4.4 Article

Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT
卷 5, 期 3, 页码 265-273

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.14778/2078331.2078341

关键词

-

资金

  1. Australian Research Council
  2. NICTA Victorian Research Laboratory
  3. Australian Government
  4. Digital Economy
  5. Australian Research Council through the ICT Centre of Excellence program
  6. Newton Fellowship

向作者/读者索取更多资源

Compression techniques that support fast random access are a core component of any information system. Current state-of- the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive algorithm such as GZIP. Random access to a specific document then requires decompression of a block. The choice of block size is critical: it trades between compression effectiveness and document retrieval times. In this paper we present a scalable compression method for large document collections that allows fast random access. We build a representative sample of the collection and use it as a dictionary in a LZ77-like encoding of the rest of the collection, relative to the dictionary. We demonstrate on large collections, that using a dictionary as small as 0.1% of the collection size, our algorithm is dramatically faster than previous methods, and in general gives much better compression.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据