☆ 4.7 Article

When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality

BIOINFORMATICS (2015)

Journal

BIOINFORMATICS

Volume 31, Issue 18, Pages 2972-2980

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btv311

Keywords

Funding

US National Science Foundation [DBI-1062301, IIS-1302134]
USDA National Institute of Food and Agriculture [2009-65300-05645]
USAID Feed the Future program [AID-OAA-A-13-00070]
UC Riverside Agricultural Experiment Station Hatch Project [CA-R-BPS-5306-H]
Direct For Computer & Info Scie & Enginr [1302134, 1526742] Funding Source: National Science Foundation
Div Of Information & Intelligent Systems [1302134, 1526742] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. Results: We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data.

When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper