4.6 Article

Computational identification of promoters in Klebsiella aerogenes by using support vector machine

Journal

FRONTIERS IN MICROBIOLOGY
Volume 14, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fmicb.2023.1200678

Keywords

promoter; pseudo k-tuple nucleotide composition; position-correlation scoring function; feature selection; support vector machine

Categories

Ask authors/readers for more resources

This study aims to develop a machine learning-based model for predicting promoters in Klebsiella aerogenes. The model utilizes a unique encoding and optimization method to accurately identify promoter sequences in K. aerogenes.
Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available