4.5 Article

Dashing: fast and accurate genomic distances with HyperLogLog

Journal

GENOME BIOLOGY
Volume 20, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s13059-019-1875-0

Keywords

Sketch data structures; Hyperloglog; Metagenomics; Alignment; Sequencing; Genomic distance

Funding

  1. National Science Foundation [IIS-1349906]
  2. National Institutes of Health/National Institute of General Medical Sciences [R01GM118568]

Ask authors/readers for more resources

Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available