4.6 Article

iPro2L-PSTKNC: A Two-Layer Predictor for Discovering Various Types of Promoters by Position Specific of Nucleotide Composition

Journal

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS
Volume 25, Issue 6, Pages 2329-2337

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JBHI.2020.3026735

Keywords

k-mer nucleotide composition; Promoter; position specific tendency; two-layer predictor

Funding

  1. National Natural Science Foundation of China [61772362, 61902271, 61972280]
  2. National Key R&D Program of China [2018YFC0910405, 2017YFC0908400]

Ask authors/readers for more resources

Promoters, regulatory elements located near transcription start sites, initiate gene transcription. A novel two-layer predictor, iPro2L-PSTKNC, based on a new feature extraction model, PSTKNC, is developed to identify E.coli genome promoters effectively. The ensemble classification SVM shows the best performance with high accuracy and MCC.
Promoters are DNA regulatory elements located proximal to the transcription start site, which are in charge of the initiation of specific gene transcription. In Escherichia coli, promoters can be recognized by sigma factors that have multiple families based on distinct function and structure, such as sigma(24), sigma(28), sigma(38), sigma(54) and sigma(70). At present, biological methods are mainly used to identify these promoters. However, because it is time-consuming and material-consuming to do biological experiments, computational biology algorithm has emerged as a more effective way to predict the classification. In this study, we develop a novel two-layer seamless predictor called iPro2L-PSTKNC to identify the promoters of the E.coli genome, which based on the feature extraction model we newly proposed that is named as the position specific tendencies of k-mer nucleotide composition (PSTKNC). On the first layer, it is a binary classification predicting whether a sequence is promoter or not. And the second layer is a multiple classification identifying which type the identified promoter belongs to. The ensemble classification SVM performsbest comparing with other algorithms, which gets a promising accuracy and the Matthews correlation coefficient (MCC) at 90.05% and 80.13%. Our data and code are available at https://github.com/lyuyinuo/iPro2L-PSTKNC

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available