4.6 Article

DPProm: A Two-Layer Predictor for Identifying Promoters and Their Types on Phage Genome Using Deep Learning

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JBHI.2022.3193224

关键词

Bioinformatics; Genomics; Feature extraction; Training; Deep learning; DNA; Neural networks; Deep learning; multi-view features; phage; whole genome

资金

  1. National Key Research and Development Program of China [2020YFA0908700]
  2. National Natural Science Foundation of China [11835014, U19A2064]
  3. Education Department of Anhui Province [KJ2020A0047]

向作者/读者索取更多资源

In this study, a two-layer model called DPProm is introduced for predicting phage promoters and their types. The first layer, DPProm-1L, uses a dual-channel deep neural network ensemble method to identify whether a DNA sequence is a promoter or non-promoter. The second layer, DPProm-2L, predicts the types of promoters. Experimental results show that DPProm outperforms existing methods and reduces false positive rate effectively. A user-friendly web interface is also provided.
With the number of phage genomes increasing, it is urgent to develop new bioinformatics methods for phage genome annotation. Promoter, a DNA region, is important for gene transcriptional regulation. In the era of post-genomics, the availability of data makes it possible to establish computational models for promoter identification with robustness. In this work, we introduce DPProm, a two-layer model composed of DPProm-1L and DPProm-2L, to predict promoters and their types for phages. On the first layer, as a dual-channel deep neural network ensemble method fusing multi-view features (sequence feature and handcrafted feature), the model DPProm-1L is proposed to identify whether a DNA sequence is a promoter or non-promoter. The sequence feature is extracted with convolutional neural network (CNN). And the handcrafted feature is the combination of free energy, GC content, cumulative skew, and Z curve features. On the second layer, DPProm-2L based on CNN is trained to predict the promoters' types (host or phage). For the realization of prediction on the whole genomes, the model DPProm, combines with a novel sequence data processing workflow, which contains sliding window and merging sequences modules. Experimental results show that DPProm outperforms the state-of-the-art methods, and decreases the false positive rate effectively on whole genome prediction. Furthermore, we provide a user-friendly web at http://bioinfo.ahu.edu.cn/DPProm. We expect that DPProm can serve as a useful tool for identification of promoters and their types.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据