4.2 Article

ANALYSIS OF CONTEXT-DEPENDENT ERRORS FOR ILLUMINA SEQUENCING

Journal

Publisher

IMPERIAL COLLEGE PRESS
DOI: 10.1142/S0219720012410053

Keywords

Next-generation sequencing; statistical measures; error probability; quality value

Ask authors/readers for more resources

The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of correcting this error (the second best call probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be corrected with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their second best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available