4.6 Article

iPro2L-PSTKNC: A Two-Layer Predictor for Discovering Various Types of Promoters by Position Specific of Nucleotide Composition

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JBHI.2020.3026735

关键词

k-mer nucleotide composition; Promoter; position specific tendency; two-layer predictor

资金

  1. National Natural Science Foundation of China [61772362, 61902271, 61972280]
  2. National Key R&D Program of China [2018YFC0910405, 2017YFC0908400]

向作者/读者索取更多资源

Promoters, regulatory elements located near transcription start sites, initiate gene transcription. A novel two-layer predictor, iPro2L-PSTKNC, based on a new feature extraction model, PSTKNC, is developed to identify E.coli genome promoters effectively. The ensemble classification SVM shows the best performance with high accuracy and MCC.
Promoters are DNA regulatory elements located proximal to the transcription start site, which are in charge of the initiation of specific gene transcription. In Escherichia coli, promoters can be recognized by sigma factors that have multiple families based on distinct function and structure, such as sigma(24), sigma(28), sigma(38), sigma(54) and sigma(70). At present, biological methods are mainly used to identify these promoters. However, because it is time-consuming and material-consuming to do biological experiments, computational biology algorithm has emerged as a more effective way to predict the classification. In this study, we develop a novel two-layer seamless predictor called iPro2L-PSTKNC to identify the promoters of the E.coli genome, which based on the feature extraction model we newly proposed that is named as the position specific tendencies of k-mer nucleotide composition (PSTKNC). On the first layer, it is a binary classification predicting whether a sequence is promoter or not. And the second layer is a multiple classification identifying which type the identified promoter belongs to. The ensemble classification SVM performsbest comparing with other algorithms, which gets a promising accuracy and the Matthews correlation coefficient (MCC) at 90.05% and 80.13%. Our data and code are available at https://github.com/lyuyinuo/iPro2L-PSTKNC

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据