4.7 Article

Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning

Journal

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fcell.2020.626221

Keywords

polyphosphate-accumulating organisms; genome sequences; minimap2; support vector machine; prediction

Funding

  1. Nanqi Ren Studio, Academy of Environment & Ecology, Harbin Institute of Technology [HSCJ201702]
  2. National Science and Technology Major Project of Twelfth Five Years [2014ZX07201-012-2]

Ask authors/readers for more resources

The study introduced a support vector machine algorithm to achieve rapid identification and prediction of PAOs using bioinformatics data, demonstrating high accuracy and stability.
In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 +/- 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available