4.5 Article

CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s12539-021-00500-0

关键词

N4-Acetylcytidine; RNA modification; Deep learning; XGBoost; Long short-term memory; Convolution neural network

资金

  1. National Natural Science Foundation of China [61672356, 11871061]
  2. Scientific Research Fund of Hunan Provincial Education Department [21A0466, 18A253]
  3. open project of Hunan Key Laboratory for Computation and Simulation in Science and Engineering [2019LCESE03]
  4. Postgraduate Scientific Research Innovation Project of Hunan Province [CX20211271]
  5. Shaoyang University Innovation Foundation for Postgraduate [CX2021SY001, CX2021SY033]

向作者/读者索取更多资源

N4-Acetylcytidine (ac4C) is a highly conserved RNA modification playing versatile roles in cellular processes. Researchers proposed a hybrid model combining semantic and traditional features to predict ac4C sequences, achieving better performance than other methods in cross-validation and independent testing. The proposed model has been implemented as a user-friendly web server.
N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k- nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ ac4c/. The presented model and tool are beneficial to identify ac4C on large scale.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据