4.7 Article

HierCC: a multi-level clustering scheme for population assignments based on core genome MLST

Journal

BIOINFORMATICS
Volume 37, Issue 20, Pages 3645-3646

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab234

Keywords

-

Funding

  1. Wellcome Trust [202792/Z/16/Z]

Ask authors/readers for more resources

pHierCC is a scalable clustering scheme based on core genome multi-locus typing, allowing incremental, static, and multi-level cluster assignments of genomes. HCCeval is used to identify optimal thresholds for assigning genomes to cohesive HierCC clusters. HierCC has genotyped over 530,000 genomes from various bacteria species since its implementation in 2018.
Motivation: Routine infectious disease surveillance is increasingly based on large-scale whole-genome sequencing databases. Real-time surveillance would benefit from immediate assignments of each genome assembly to hierarchical population structures. Here we present pHierCC, a pipeline that defines a scalable clustering scheme, HierCC, based on core genome multi-locus typing that allows incremental, static, multi-level cluster assignments of genomes. We also present HCCeval, which identifies optimal thresholds for assigning genomes to cohesive HierCC clusters. HierCC was implemented in EnteroBase in 2018 and has since genotyped >530 000 genomes from Salmonella, Escherichia/Shigella, Streptococcus, Clostridioides, Vibrio and Yersinia.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available