期刊
JOURNAL OF COMPUTATIONAL BIOLOGY
卷 16, 期 8, 页码 1101-1116出版社
MARY ANN LIEBERT, INC
DOI: 10.1089/cmb.2009.0047
关键词
bidirected flow; bidirected graph; Chinese postman; de Bruijn graph; genome assembly; matepairs; sequence assembly
类别
资金
- NIGMS NIH HHS [R01-GM81080] Funding Source: Medline
Whole genome shotgun assembly is the process of taking many short sequenced segments ( reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the double-stranded nature of DNA for genome assembly. By combining an algorithm for the Chinese Postman Problem on bidirected graphs with the construction of a bidirected de Bruijn graph, we are able to find the shortest double-stranded DNA sequence that contains a given set of k-long DNA molecules. This is the first exact polynomial time algorithm for the assembly of a double-stranded genome. Furthermore, we propose a maximum likelihood framework for assembling the genome that is the most likely source of the reads, in lieu of the standard maximum parsimony approach ( which finds the shortest genome subject to some constraints). In this setting, we give a bidirected network flow-based algorithm that, by taking advantage of high coverage, accurately estimates the copy counts of repeats in a genome. Our second algorithm combines these predicted copy counts with matepair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from Escherichia coli and predict copy counts with extremely high accuracy, while assembling long contigs.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据