4.7 Article

Unbiased pangenome graphs

向作者/读者索取更多资源

This study presents the seqwish algorithm, which can build a variation graph from a set of sequences and their alignments. By transforming the alignment set into a tree-based representation and querying this representation, the algorithm constructs a variation graph, resulting in a pangenome variation graph. The method is scalable and has been successfully applied to build pangenome graphs for multiple species.
Motivation: Pangenome variation graphs model the mutual alignment of collections of DNA sequences. A set of pairwise alignments implies a variation graph, but there are no scalable methods to generate such a graph from these alignments. Existing related approaches depend on a single reference, a specific ordering of genomes or a de Bruijn model based on a fixed k-mer length. A scalable, self-contained method to build pangenome graphs without such limitations would be a key step in pangenome construction and manipulation pipelines. Results: We design the seqwish algorithm, which builds a variation graph from a set of sequences and alignments between them. We first transform the alignment set into an implicit interval tree. To build up the variation graph, we query this tree-based representation of the alignments to reduce transitive matches into single DNA segments in a sequence graph. By recording the mapping from input sequence to output graph, we can trace the original paths through this graph, yielding a pangenome variation graph. We present an implementation that operates in external memory, using disk-backed data structures and lock-free parallel methods to drive the core graph induction step. We demonstrate that our method scales to very large graph induction problems by applying it to build pangenome graphs for several species. Availability and implementation: eqwishis published as free software under the MIT open source license. Sourcecode and documentation are available athttps://github.com/ekg/seqwish.seqwishcan be installed via Biocondahttps://bioconda.github.io/recipes/seqwish/README.htmlor GNU Guixhttps://github.com/ekg/guix-genomics/blob/master/seqwish.scm. Contact:egarris5@uthsc.edu

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据