4.5 Article

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

Journal

GENOME BIOLOGY
Volume 22, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s13059-020-02254-2

Keywords

Sequencer; instrument error; Error suppression; DNA sequencing

Funding

  1. Fund for Innovation in Cancer Informatics from the National Institutes of Health
  2. Cancer Center Support Grant from the National Institutes of Health [P30CA021765]
  3. American Lebanese Syrian Associated Charities (ALSAC)

Ask authors/readers for more resources

The study proposed a new computational method, SequencErr, to measure errors in sequencing instruments, revealing the sequencer error rate to be around 10 per million. The method demonstrated a 10-fold lower error rate compared to popular error correction methods and can provide novel insights into DNA sequencing errors.
Background There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. Results We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be similar to 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. Conclusions Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available