4.7 Article

Filtering duplicate reads from 454 pyrosequencing data

Journal

BIOINFORMATICS
Volume 29, Issue 7, Pages 830-836

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btt047

Keywords

-

Funding

  1. National Program for Research in Functional Genomics in Norway (FUGE) in the Research Council of Norway (RCN grant) [183640/S10]

Ask authors/readers for more resources

Motivation: Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the estimation of diversity and composition of a sample. To date, there are several tools that aim to remove both sequencing noise and duplicates. Nevertheless, duplicate removal is often based on nucleotide sequences rather than on the underlying flow values, which contain additional information. Results: With the novel tool JATAC, we present an approach towards a more accurate duplicate removal by analysing flow values directly. Making use of previous findings on 454 flow data characteristics, we combine read clustering with Bayesian distance measures. Finally, we provide a benchmark with an existing algorithm.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available