Journal
THEORETICAL COMPUTER SCIENCE
Volume 678, Issue -, Pages 22-39Publisher
ELSEVIER SCIENCE BV
DOI: 10.1016/j.tcs.2017.03.039
Keywords
Data structures; Suffix array; LCP array; Document array; String collections
Categories
Funding
- CAPES
- CNPq [162338/2015-5]
Ask authors/readers for more resources
Constructing the suffix array for a string collection is an important task that may be performed by sorting the concatenation of all strings. In this article we present algorithms gSAIS and gSACA-K, which extend SAIS (Nong et al., 2011) [8] and SACA-IC (Nong, 2013) [10] to construct the suffix array for a string collection maintaining their theoretical bounds, respecting the order among all suffixes, and improving their practical performance. gSACA-K is an optimal suffix array construction algorithm for string collections from constant alphabets. Moreover, we show how to modify gSAIS and gSACA-K to also compute the longest common prefix array and the document array as a byproduct of suffix sorting, with the same theoretical bounds. We performed experiments that showed that our algorithms have an outstanding practical performance. (C) 2017 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available