4.8 Article

Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

出版社

NATL ACAD SCIENCES
DOI: 10.1073/pnas.1121464109

关键词

metagenomics; compression

资金

  1. Agriculture and Food Research Initiative from the United States Department of Agriculture, National Institute of Food and Agriculture [2010-65205-20361]
  2. National Science Foundation [IOS-0923812]
  3. NSF [0905961]
  4. Direct For Biological Sciences
  5. Div Of Biological Infrastructure [0905961] Funding Source: National Science Foundation
  6. Division Of Integrative Organismal Systems
  7. Direct For Biological Sciences [0923812] Funding Source: National Science Foundation
  8. NIFA [2010-65205-20361, 581141] Funding Source: Federal RePORTER

向作者/读者索取更多资源

Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据