4.6 Article

A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes

期刊

BMC BIOINFORMATICS
卷 22, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12859-021-04149-w

关键词

Genome graph; Coordinate system; Variant detection

资金

  1. State Key Basic Research and Development Plan [2017YFA0605104]
  2. key project of the State Key Laboratory of Earth Surface Processes and Resource Ecology

向作者/读者索取更多资源

The study presents a new method called colored superbubble (cSupB) for organizing and analyzing spatial structure of genome graphs, proposing a novel tri-tuple coordinate system. The method efficiently detects small indels and can adapt to complex cycle structures.
Background: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. Results: We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (<50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. Conclusions: Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C+ +program for implementing our method that is available at https://github.com/eggleader/cSupB.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据