Journal
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS
Volume 25, Issue 6, Pages 2329-2337Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JBHI.2020.3026735
Keywords
k-mer nucleotide composition; Promoter; position specific tendency; two-layer predictor
Categories
Funding
- National Natural Science Foundation of China [61772362, 61902271, 61972280]
- National Key R&D Program of China [2018YFC0910405, 2017YFC0908400]
Ask authors/readers for more resources
Promoters, regulatory elements located near transcription start sites, initiate gene transcription. A novel two-layer predictor, iPro2L-PSTKNC, based on a new feature extraction model, PSTKNC, is developed to identify E.coli genome promoters effectively. The ensemble classification SVM shows the best performance with high accuracy and MCC.
Promoters are DNA regulatory elements located proximal to the transcription start site, which are in charge of the initiation of specific gene transcription. In Escherichia coli, promoters can be recognized by sigma factors that have multiple families based on distinct function and structure, such as sigma(24), sigma(28), sigma(38), sigma(54) and sigma(70). At present, biological methods are mainly used to identify these promoters. However, because it is time-consuming and material-consuming to do biological experiments, computational biology algorithm has emerged as a more effective way to predict the classification. In this study, we develop a novel two-layer seamless predictor called iPro2L-PSTKNC to identify the promoters of the E.coli genome, which based on the feature extraction model we newly proposed that is named as the position specific tendencies of k-mer nucleotide composition (PSTKNC). On the first layer, it is a binary classification predicting whether a sequence is promoter or not. And the second layer is a multiple classification identifying which type the identified promoter belongs to. The ensemble classification SVM performsbest comparing with other algorithms, which gets a promising accuracy and the Matthews correlation coefficient (MCC) at 90.05% and 80.13%. Our data and code are available at https://github.com/lyuyinuo/iPro2L-PSTKNC
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available