4.5 Article

Mash Screen: high-throughput sequence containment estimation for genome discovery

Journal

GENOME BIOLOGY
Volume 20, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s13059-019-1841-x

Keywords

MinHash; Metagenomics; Sequencing; SRA; Viral Discovery; Polyomavirus

Funding

  1. Intramural Research Programs of the National Human Genome Research Institute
  2. National Cancer Institute, National Institutes of Health

Ask authors/readers for more resources

The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available