☆ 4.5 Article

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

GENOME BIOLOGY (2021)

Journal

GENOME BIOLOGY

Volume 22, Issue 1, Pages -

Publisher

BMC

DOI: 10.1186/s13059-020-02254-2

Keywords

Sequencer; instrument error; Error suppression; DNA sequencing

Funding

Fund for Innovation in Cancer Informatics from the National Institutes of Health
Cancer Center Support Grant from the National Institutes of Health [P30CA021765]
American Lebanese Syrian Associated Charities (ALSAC)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study proposed a new computational method, SequencErr, to measure errors in sequencing instruments, revealing the sequencer error rate to be around 10 per million. The method demonstrated a 10-fold lower error rate compared to popular error correction methods and can provide novel insights into DNA sequencing errors.

Background There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. Results We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be similar to 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. Conclusions Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

Journal

GENOME BIOLOGY

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

Journal

GENOME BIOLOGY

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper