4.7 Article

Toward a high-quality pan-genome landscape of Bacillus subtilis by removal of confounding strains

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 2, Pages 1951-1971

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa013

Keywords

pan-genome analysis; phylogenetic relationship; average nucleotide identity; artificial genome; pan-genome landscape; genome reduction

Funding

  1. National Key Research and Development Program of China [2018YFA0903700]
  2. National Natural Science Foundation of China [31571358, 21621004, 31171238, 9174611]

Ask authors/readers for more resources

Pan-genome analysis is widely used to study species evolution and genetic diversity, particularly in bacteria. However, the influence of strain selection on the accuracy and reliability of pan-genome results is not well understood. This study found that including confounding strains in pan-genome analyses of Bacillus subtilis can significantly affect the quality and accuracy of the results, emphasizing the importance of removing biases from such strains in the data processing stage to achieve a closer representation of the true pan-genome landscape.
Pan-genome analysis is widely used to study the evolution and genetic diversity of species, particularly in bacteria. However, the impact of strain selection on the outcome of pan-genome analysis is poorly understood. Furthermore, a standard protocol to ensure high-quality pan-genome results is lacking. In this study, we carried out a series of pan-genome analyses of different strain sets of Bacillus subtilis to understand the impact of various strains on the performance and output quality of pan-genome analyses. Consequently, we found that the results obtained by pan-genome analyses of B. subtilis can be influenced by the inclusion of incorrectly classified Bacillus subspecies strains, phylogenetically distinct strains, engineered genome-reduced strains, chimeric strains, strains with a large number of unique genes or a large proportion of pseudogenes, and multiple clonal strains. Since the presence of these confounding strains can seriously affect the quality and true landscape of the pan-genome, we should remove these deviations in the process of pan-genome analyses. Our study provides new insights into the removal of biases from confounding strains in pan-genome analyses at the beginning of data processing, which enables the achievement of a closer representation of a high-quality pan-genome landscape of B. subtilis that better reflects the performance and credibility of the B. subtilis pan-genome. This procedure could be added as an important quality control step in pan-genome analyses for improving the efficiency of analyses, and ultimately contributing to a better understanding of genome function, evolution and genome-reduction strategies for B. subtilis in the future.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available