4.7 Article

Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning

期刊

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fcell.2020.626221

关键词

polyphosphate-accumulating organisms; genome sequences; minimap2; support vector machine; prediction

资金

  1. Nanqi Ren Studio, Academy of Environment & Ecology, Harbin Institute of Technology [HSCJ201702]
  2. National Science and Technology Major Project of Twelfth Five Years [2014ZX07201-012-2]

向作者/读者索取更多资源

The study introduced a support vector machine algorithm to achieve rapid identification and prediction of PAOs using bioinformatics data, demonstrating high accuracy and stability.
In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 +/- 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据