☆ 4.5 Article

WHATSHAP: Weighted Haplotype Assembly for Future-Generation Sequencing Reads

JOURNAL OF COMPUTATIONAL BIOLOGY (2015)

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

卷 22, 期 6, 页码 498-509

出版社

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2014.0157

关键词

algorithms; combinatorial optimization; dynamic programming; haplotypes; next generation sequencing

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

Marie Curie ABCDE Fellowship of ERCIM
Veni
Vidi grant of the Netherlands Organisation for Scientific Research (NWO)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The human genome is diploid, which requires assigning heterozygous single nucleotide polymorphisms (SNPs) to the two copies of the genome. The resulting haplotypes, lists of SNPs belonging to each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, which are oblivious to direct read information, constitute the state-of-the-art. Haplotype assembly, which addresses phasing directly from sequencing reads, suffers from the fact that sequencing reads of the current generation are too short to serve the purposes of genome-wide phasing. While future-technology sequencing reads will contain sufficient amounts of SNPs per read for phasing, they are also likely to suffer from higher sequencing error rates. Currently, no haplotype assembly approaches exist that allow for taking both increasing read length and sequencing error information into account. Here, we suggest WhatsHap, the first approach that yields provably optimal solutions to the weighted minimum error correction problem in runtime linear in the number of SNPs. WhatsHap is a fixed parameter tractable (FPT) approach with coverage as the parameter. We demonstrate that WhatsHap can handle datasets of coverage up to 20x, and that 15x are generally enough for reliably phasing long reads, even at significantly elevated sequencing error rates. We also find that the switch and flip error rates of the haplotypes we output are favorable when comparing them with state-of-the-art statistical phasers.

WHATSHAP: Weighted Haplotype Assembly for Future-Generation Sequencing Reads

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

WHATSHAP: Weighted Haplotype Assembly for Future-Generation Sequencing Reads

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文