4.5 Article

iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection

期刊

ANALYTICAL BIOCHEMISTRY
卷 630, 期 -, 页码 -

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ab.2021.114335

关键词

Promoters; K-mer; Binary encoding; Distance transformation; Extra trees

资金

  1. National Natural Science Foundation of China [11601407]
  2. Natural Science Basic Research Program of Shaanxi [2021JM-115, 2021JQ-657]
  3. Fundamental Research Funds for the Central Universities [JB210715]

向作者/读者索取更多资源

The study developed a computational tool named iPromoter-ET for effectively identifying promoters and their strength. By utilizing a combination of methods, the model outperforms existing models in accuracy and stability, showing high performance in classifying DNA sequences.
Promoter is a region of DNA that determines the transcription of a particular gene. There are several sigma factors in the RNA polymerase, which has the function of identifying the promoter and facilitating the binding of the RNA polymerase to the promoter. Owing to the importance of promoter in genome research, it is an urgent task to develop computational tool for effectively identifying promoters and their strength facing the avalanche of DNA sequences discovered in the post-genomic age. In this paper, we develop a model named iPromoter-ET using the k-mer nucleotide composition, binary encoding and dinucleotide property matrix-based distance transformation for features extraction, and extremely randomized trees (extra trees) for feature selection. Its 1st layer is used to identify whether a DNA sequence is of promoter or not, while its 2nd layer is to identify promoter samples as being strong or weak promoter. Support vector machine and the five cross-validation are used to perform identification and assess performance, respectively. The results indicate that our model remarkably outperforms the existing models in both the 1st and 2nd layers for accuracy and stability. We anticipate that our proposed model will become a very effective intelligent tool, or at the least, a complementary tool to the existing modes of identifying promoters and their strength. Moreover, the datasets and codes for iPromoter-ET are freely available at https://github.com/shengli0201/iPromoter-ET.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据