4.7 Article

Identifying and removing haplotypic duplication in primary genome assemblies

Journal

BIOINFORMATICS
Volume 36, Issue 9, Pages 2896-2898

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa025

Keywords

-

Funding

  1. National Key Research and Development Program of China [2017YFC0907503, 2018YFC0910504, 2017YFC1201201]
  2. China Scholarship Council
  3. Wellcome Trust [WT207492, WT206194]

Ask authors/readers for more resources

Motivation: Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. Results: Here we present a novel tool, purge_dups, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available