4.7 Article

Fast de Bruijn Graph Compaction in Distributed Memory Environments

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2018.2858797

关键词

De Bruijn graph; genome assembly; graph compaction; parallel algorithms

资金

  1. National Science Foundation [IIS-1416259, CNS-1229081, CCF-1360593, CCF-1361053]
  2. Intel Parallel Computing Center on Big Data in Biosciences and Public Health
  3. Division of Computing and Communication Foundations
  4. Direct For Computer & Info Scie & Enginr [1361053] Funding Source: National Science Foundation

向作者/读者索取更多资源

De Bruijn graph based genome assembly has gained popularity as short read sequencers become ubiquitous. A core assembly operation is the generation of unitigs, which are sequences corresponding to chains in the graph. Unitigs are used as building blocks for generating longer sequences in many assemblers, and can facilitate graph compression. Chain compaction, by which unitigs are generated, remains a critical computational task. In this paper, we present a distributed memory parallel algorithm for simultaneous compaction of all chains in bi-directed de Bruijn graphs. The key advantages of our algorithm include bounding the chain compaction run-time to logarithmic number of iterations in the length of the longest chain, and ability to differentiate cycles from chains within logarithmic number of iterations in the length of the longest cycle. Our algorithm scales to thousands of computational cores, and can compact a whole genome de Bruijn graph from a human sequence read set in 7.3 seconds using 7680 distributed memory cores, and in 12.9 minutes using 64 shared memory cores. It is 3.7 x and 2.0x faster than equivalent steps in the state-of-the-art tools for distributed and shared memory environments, respectively. An implementation of the algorithm is available at https://github.com/ParBLiSS/bruno.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据