4.6 Article

Understanding the Genetic Diversity of Mycobacterium africanum Using Phylogenetics and Population Genomics Approaches

Journal

FRONTIERS IN GENETICS
Volume 13, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2022.800083

Keywords

Mycobacterium africanum; population genomics; SNP; lineage; bioinformatics

Funding

  1. National Supercomputing Mission (NSM) of the Government of India

Ask authors/readers for more resources

This study analyzed core-cluster-specific single nucleotide polymorphisms (SNPs) of Mycobacterium tuberculosis var. africanum to determine its lineage and clustering patterns. The results identified two lineages, L5 and L6, with further sub-lineages within each lineage. The analysis of SNPs also revealed genetic variations associated with strain specificity and growth attenuation.
A total of two lineages of Mycobacterium tuberculosis var. africanum (Maf), L5 and L6, which are members of the Mycobacterium tuberculosis complex (MTBC), are responsible for causing tuberculosis in West Africa. Regions of difference (RDs) are usually used for delineation of MTBC. With increased data availability, single nucleotide polymorphisms (SNPs) promise to provide better resolution. Publicly available 380 Maf samples were analyzed for identification of core-cluster-specific-SNPs, while additional 270 samples were used for validation. RD-based methods were used for lineage-assignment, wherein 31 samples remained unidentified. The genetic diversity of Maf was estimated based on genome-wide SNPs using phylogeny and population genomics approaches. Lineage-based clustering (L5 and L6) was observed in the whole genome phylogeny with distinct sub-clusters. Population stratification using both model-based and de novo approaches supported the same observations. L6 was further delineated into three sub-lineages (L6.1-L6.3), whereas L5 was grouped as L5.1 and L5.2 based on the occurrence of RD711. L5.1 and L5.2 were further divided into two (L5.1.1 and L5.1.2) and four (L5.2.1-L5.2.4) sub-clusters, respectively. Unassigned samples could be assigned to definite lineages/sub-lineages based on clustering observed in phylogeny along with high-confidence posterior membership scores obtained during population stratification. Based on the (sub)-clusters delineated, core-cluster-specific-SNPs were derived. Synonymous SNPs (137 in L5 and 128 in L6) were identified as biomarkers and used for validation. Few of the cluster-specific missense variants in L5 and L6 belong to the central carbohydrate metabolism pathway which include His6Tyr (Rv0946c), Glu255Ala (Rv1131), Ala309Gly (Rv2454c), Val425Ala and Ser112Ala (Rv1127c), Gly198Ala (Rv3293) and Ile137Val (Rv0363c), Thr421Ala (Rv0896), Arg442His (Rv1248c), Thr218Ile (Rv1122), and Ser381Leu (Rv1449c), hinting at the differential growth attenuation. Genes harboring multiple (sub)-lineage-specific core-cluster SNPs such as Lys117Asn, Val447Met, and Ala455Val (Rv0066c; icd2) present across L6, L6.1, and L5, respectively, hinting at the association of these SNPs with selective advantage or host-adaptation. Cluster-specific SNPs serve as additional markers along with RD-regions for Maf delineation. The identified SNPs have the potential to provide insights into the genotype-phenotype correlation and clues for endemicity of Maf in the African population.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available