4.3 Article

Space-efficient construction of compressed suffix trees

期刊

THEORETICAL COMPUTER SCIENCE
卷 852, 期 -, 页码 138-156

出版社

ELSEVIER
DOI: 10.1016/j.tcs.2020.11.024

关键词

Burrows-Wheeler transform; Compressed suffix tree; LCP; PLCP

资金

  1. [RBSI146R5L]

向作者/读者索取更多资源

The paper demonstrates how to construct data structures essential for string processing using the Burrows-Wheeler transform (BWT) as input. The algorithms provided for enumerating LCP values, suffix tree intervals, PLCP bitvector, and balanced parentheses representation have efficient space and time complexities. Additionally, the paper proposes solutions for merging BWTs of string collections with improved efficiency.
We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let n be the text length and sigma be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in O(n log sigma) time using just o(n log sigma) bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter 0 < epsilon <= 1 we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in O (n(log sigma + epsilon(-1).log log n)) time using at most n logs . (epsilon + o(1)) bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. o(n log sigma) bits) and Theta (n log sigma + n(log log n)(1+delta)) time, for any constant delta > 0. This improves the previous most space-efficient algorithms, which worked in O(n) bits and O(n log n) time. We also consider the problem of merging BWTs of string collections, and provide a solution running in O(n log sigma) time and using just o(n log sigma) bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as n bits on top of a packed representation of the input/output and process data as fast as 2.92 megabases per second. (C) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据