4.7 Article

Indexing Highly Repetitive String Collections, Part II: Compressed Indexes

期刊

ACM COMPUTING SURVEYS
卷 54, 期 2, 页码 -

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3432999

关键词

Text indexing; string searching; compressed data structures; repetitive string collections

资金

  1. ANID Basal Funds [FB0001]
  2. Millennium Science Initiative Program [ICN17_002]
  3. Fondecyt, Chile [1-200038]

向作者/读者索取更多资源

The text discusses a breakthrough in indexing string collections, which allows representing them in a compressed space while offering indexed search functionalities. With the technology permeating through applications like bioinformatics, string collections have experienced rapid growth, particularly due to their high repetitiveness. New techniques have been developed to properly exploit this repetitiveness, leading to the creation of a new generation of data structures capable of handling huge repetitive string collections.
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore's Law and challenges our ability of handling them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures. In this second part, we describe the fundamental algorithmic ideas and data structures that form the base of all the existing indexes, and the various concrete structures that have been proposed, comparing them both in theoretical and practical aspects, and uncovering some new combinations. We conclude with the current challenges in this fascinating field.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据