☆ 4.3 Article

Space-efficient construction of compressed suffix trees

THEORETICAL COMPUTER SCIENCE (2021)

期刊

THEORETICAL COMPUTER SCIENCE

卷 852, 期 -, 页码 138-156

出版社

ELSEVIER

DOI: 10.1016/j.tcs.2020.11.024

关键词

Burrows-Wheeler transform; Compressed suffix tree; LCP; PLCP

类别

Computer Science, Theory & Methods

资金

[RBSI146R5L]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The paper demonstrates how to construct data structures essential for string processing using the Burrows-Wheeler transform (BWT) as input. The algorithms provided for enumerating LCP values, suffix tree intervals, PLCP bitvector, and balanced parentheses representation have efficient space and time complexities. Additionally, the paper proposes solutions for merging BWTs of string collections with improved efficiency.

We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let n be the text length and sigma be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in O(n log sigma) time using just o(n log sigma) bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter 0 < epsilon <= 1 we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in O (n(log sigma + epsilon(-1).log log n)) time using at most n logs . (epsilon + o(1)) bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. o(n log sigma) bits) and Theta (n log sigma + n(log log n)(1+delta)) time, for any constant delta > 0. This improves the previous most space-efficient algorithms, which worked in O(n) bits and O(n log n) time. We also consider the problem of merging BWTs of string collections, and provide a solution running in O(n log sigma) time and using just o(n log sigma) bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as n bits on top of a packed representation of the input/output and process data as fast as 2.92 megabases per second. (C) 2020 Elsevier B.V. All rights reserved.

Space-efficient construction of compressed suffix trees

期刊

THEORETICAL COMPUTER SCIENCE

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Space-efficient construction of compressed suffix trees

期刊

THEORETICAL COMPUTER SCIENCE

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文