4.7 Article

Improved Node and Arc Multiplicity Estimation in De Bruijn Graphs Using Approximate Inference in Conditional Random Fields

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2022.3229085

关键词

Belief propagation; conditional random fields; de bruijn graphs; message passing order

向作者/读者索取更多资源

In de novo genome assembly, accurate determination of node and arc multiplicities in a de Bruijn graph is crucial for improving assembly quality and contiguity. A Conditional Random Field (CRF) model is proposed to model the de Bruijn graph and read coverage information. Loopy Belief Propagation (LBP) is applied for approximate inference to improve multiplicity assignment accuracy. The order of message passing in LBP greatly affects its convergence speed, and an empirical evaluation of different message passing schemes is presented.
In de novo genome assembly using short Illumina reads, the accurate determination of node and arc multiplicities in a de Bruijn graph has a large impact on the quality and contiguity of the assembly. The multiplicity estimates of nodes and arcs guide the cleaning of the de Bruijn graph by identifying spurious nodes and arcs that correspond to sequencing errors. Additionally, they can be used to guide repeat resolution. Here, we model the entire de Bruijn graph and the accompanying read coverage information with a single Conditional Random Field (CRF) model. We show that approximate inference using Loopy Belief Propagation (LBP) on our model improves multiplicity assignment accuracy within feasible runtimes. The order in which messages are passed has a large influence on the speed of LBP convergence. Little theoretical guarantees exist and the conditions for convergence are not easily checked as our CRF model contains higher-order interactions. Therefore, we also present an empirical evaluation of several message passing schemes that may guide future users of LBP on CRFs with higher-order interactions in their choice of message passing scheme.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据