☆ 4.6 Article

Family reunion via error correction: an efficient analysis of duplex sequencing data

BMC BIOINFORMATICS (2020)

Journal

BMC BIOINFORMATICS

Volume 21, Issue 1, Pages -

Publisher

BMC

DOI: 10.1186/s12859-020-3419-8

Keywords

Duplex sequence; Low frequency variants; Barcodes; Error correction

Funding

Eberly College of Science at the Pennsylvania State University
NIH [U41 HG006620, R01 AI134384-01, R01GM116044]
NSF ABI Grant [1661497]
Linz Institute of Technology [LIT213201001]
Austrian Science Fund [FWFP30867000]
Schrodinger Fellowship from the Austrian Science Fund (FWF) [J-4096]
Office of Science Engagement at Penn State
Huck Institute of Life Sciences at Penn State
Institute for CyberScience at Penn State
Pennsylvania Department of Health
Eberly College of Sciences at Penn State
Austrian Science Fund (FWF) [J4096] Funding Source: Austrian Science Fund (FWF)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Background Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. Results In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows reuniting these reads with their respective families increasing the output of the method and making it more cost effective. Conclusions We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: .

Family reunion via error correction: an efficient analysis of duplex sequencing data

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Family reunion via error correction: an efficient analysis of duplex sequencing data

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper