4.6 Article

SuRankCo: supervised ranking of contigs in de novo assemblies

Journal

BMC BIOINFORMATICS
Volume 16, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s12859-015-0644-7

Keywords

De novo assembly; Genome assembly; Next generation sequencing; Contigs; Quality control; Machine learning; Random forest

Funding

  1. German Federal Ministry of Health [IIA5-2512-FSB-725]
  2. CAPES - Ciencia sem Fronteiras [BEX 13472/13-5]

Ask authors/readers for more resources

Background: Evaluating the quality and reliability of a de novo assembly and of single contigs in particular is challenging since commonly a ground truth is not readily available and numerous factors may influence results. Currently available procedures provide assembly scores but lack a comparative quality ranking of contigs within an assembly. Results: We present SuRankCo, which relies on a machine learning approach to predict quality scores for contigs and to enable the ranking of contigs within an assembly. The result is a sorted contig set which allows selective contig usage in downstream analysis. Benchmarking on datasets with known ground truth shows promising sensitivity and specificity and favorable comparison to existing methodology. Conclusions: SuRankCo analyzes the reliability of de novo assemblies on the contig level and thereby allows quality control and ranking prior to further downstream and validation experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available