☆ 4.5 Article

AllSome Sequence Bloom Trees

JOURNAL OF COMPUTATIONAL BIOLOGY (2018)

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY

Volume 25, Issue 5, Pages 467-479

Publisher

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2017.0258

Keywords

Sequence Bloom Trees; Bloom filters; RNA-seq; data structures; algorithms; bioinformatics

Funding

NSF [DBI-1356529, CCF-551439057, IIS-1453527, IIS-1421908]
Direct For Biological Sciences
Div Of Biological Infrastructure [1356529] Funding Source: National Science Foundation
Division of Computing and Communication Foundations
Direct For Computer & Info Scie & Enginr [1439057] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3xmemory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

AllSome Sequence Bloom Trees

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY

Publisher

MARY ANN LIEBERT, INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

AllSome Sequence Bloom Trees

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY

Publisher

MARY ANN LIEBERT, INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper