4.7 Article

Comparative pangenomics: analysis of 12 microbial pathogen pangenomes reveals conserved global structures of genetic and functional diversity

Journal

BMC GENOMICS
Volume 23, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12864-021-08223-8

Keywords

Pangenome; Core genome; Comparative genomics; Multispecies; Heaps' law; Functional diversity; Sequence diversity; Protein domains; Aminoacyl-tRNA synthetases

Funding

  1. National Institute of Allergy and Infectious Diseases [AI124316]
  2. National Institutes of Health [T32GM8806]

Ask authors/readers for more resources

With the growth of publicly available genome sequences, comparative pangenomics methods have provided valuable insights into genetic diversity across multiple species. The study found that pangenome openness is associated with species' phylogenetic placement, gene function and frequency relationships are conserved across species, core genomes have high sequence diversity and functional diversity, and certain protein domains are consistently mutation enriched across multiple species.
Background: With the exponential growth of publicly available genome sequences, pangenome analyses have provided increasingly complete pictures of genetic diversity for many microbial species. However, relatively few studies have scaled beyond single pangenomes to compare global genetic diversity both within and across different species. We present here several methods forcomparative pangenomics that can be used to contextualize multi-pangenome scale genetic diversity with gene function for multiple species at multiple resolutions: pangenome shape, genes, sequence variants, and positions within variants. Results: Applied to 12,676 genomes across 12 microbial pathogenic species, we observed several shared resolution-specific patterns of genetic diversity: First, pangenome openness is associated with species' phylogenetic placement. Second, relationships between gene function and frequency are conserved across species, with core genomes enriched for metabolic and ribosomal genes and accessory genomes for trafficking, secretion, and defense-associated genes. Third, genes in core genomes with the highest sequence diversity are functionally diverse. Finally, certain protein domains are consistently mutation enriched across multiple species, especially among aminoacyl-tRNA synthetases where the extent of a domain's mutation enrichment is strongly function-dependent. Conclusions: These results illustrate the value of each resolution at uncovering distinct aspects in the relationship between genetic and functional diversity across multiple species. With the continued growth of the number of sequenced genomes, these methods will reveal additional universal patterns of genetic diversity at the pangenome scale.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available