☆ 4.4 Article

Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections

PROCEEDINGS OF THE VLDB ENDOWMENT (2011)

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

卷 5, 期 3, 页码 265-273

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.14778/2078331.2078341

关键词

类别

Computer Science, Information Systems Computer Science, Theory & Methods

资金

Australian Research Council
NICTA Victorian Research Laboratory
Australian Government
Digital Economy
Australian Research Council through the ICT Centre of Excellence program
Newton Fellowship

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Compression techniques that support fast random access are a core component of any information system. Current state-of- the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive algorithm such as GZIP. Random access to a specific document then requires decompression of a block. The choice of block size is critical: it trades between compression effectiveness and document retrieval times. In this paper we present a scalable compression method for large document collections that allows fast random access. We build a representative sample of the collection and use it as a dictionary in a LZ77-like encoding of the rest of the collection, relative to the dictionary. We demonstrate on large collections, that using a dictionary as small as 0.1% of the collection size, our algorithm is dramatically faster than previous methods, and in general gives much better compression.

Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文