4.8 Article

Fast search of thousands of short-read sequencing experiments

期刊

NATURE BIOTECHNOLOGY
卷 34, 期 3, 页码 300-+

出版社

NATURE PUBLISHING GROUP
DOI: 10.1038/nbt.3442

关键词

-

资金

  1. Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative [GBMF4554]
  2. US National Science Foundation [CCF-1256087, CCF-1319998]
  3. US National Institutes of Health [R21HG006913, R01HG007104]
  4. US National Institutes of Health as part of the Howard Hughes Medical Institute (HHMI)-National Institute of Biomedical Imaging and Bioengineering (NIBIB) Interfaces Initiative [T32 EB009403]
  5. Direct For Computer & Info Scie & Enginr
  6. Division of Computing and Communication Foundations [1319998] Funding Source: National Science Foundation
  7. Division of Computing and Communication Foundations
  8. Direct For Computer & Info Scie & Enginr [1256087] Funding Source: National Science Foundation
  9. Office of Advanced Cyberinfrastructure (OAC)
  10. Direct For Computer & Info Scie & Enginr [1445606] Funding Source: National Science Foundation

向作者/读者索取更多资源

The amount of sequence information in public repositories is growing at a rapid rate. Although these data are likely to contain clinically important information that has not yet been uncovered, our ability to effectively mine these repositories is limited. Here we introduce Sequence Bloom Trees (SBTs), a method for querying thousands of short-read sequencing experiments by sequence, 162 times faster than existing approaches. The approach searches large data archives for all experiments that involve a given sequence. We use SBTs to search 2,652 human blood, breast and brain RNA-seq experiments for all 214,293 known transcripts in under 4 days using less than 239 MB of RAM and a single CPU. Searching sequence archives at this scale and in this time frame is currently not possible using existing tools.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据