4.7 Article

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments

Journal

BIOINFORMATICS
Volume 31, Issue 23, Pages 3733-3741

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btv465

Keywords

-

Funding

  1. KAKENHI from the Ministry of Education, Culture, Sports, Science and Technology of Japan [221S0002]
  2. JSPS KAKENHI Grant [26430200]
  3. Grants-in-Aid for Scientific Research [26430200, 221S0002] Funding Source: KAKEN

Ask authors/readers for more resources

Motivation: Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites. Results: We have found that the assembly error rates caused by these tools are 20-500-fold higher than the rate of errors introduced into contigs by de novo assemblers. We here describe GMcloser, a tool that accurately closes these gaps with a preassembled contig set or a long read set (i.e. error-corrected PacBio reads). GMcloser uses likelihood-based classifiers calculated from the alignment statistics between scaffolds, contigs and paired-end reads to correctly assign contigs or long reads to gap regions of scaffolds, thereby achieving accurate and efficient gap closure. We demonstrate with sequencing data from various organisms that the gap-closing accuracy of GMcloser is 3-100-fold higher than those of other available tools, with similar efficiency.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available