☆ 4.5 Article

CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES (2022)

期刊

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

卷 14, 期 2, 页码 439-451

出版社

SPRINGER HEIDELBERG

DOI: 10.1007/s12539-021-00500-0

关键词

N4-Acetylcytidine; RNA modification; Deep learning; XGBoost; Long short-term memory; Convolution neural network

类别

Mathematical & Computational Biology

资金

National Natural Science Foundation of China [61672356, 11871061]
Scientific Research Fund of Hunan Provincial Education Department [21A0466, 18A253]
open project of Hunan Key Laboratory for Computation and Simulation in Science and Engineering [2019LCESE03]
Postgraduate Scientific Research Innovation Project of Hunan Province [CX20211271]
Shaoyang University Innovation Foundation for Postgraduate [CX2021SY001, CX2021SY033]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

N4-Acetylcytidine (ac4C) is a highly conserved RNA modification playing versatile roles in cellular processes. Researchers proposed a hybrid model combining semantic and traditional features to predict ac4C sequences, achieving better performance than other methods in cross-validation and independent testing. The proposed model has been implemented as a user-friendly web server.

N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k- nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ ac4c/. The presented model and tool are beneficial to identify ac4C on large scale.

CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction

期刊

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

出版社

SPRINGER HEIDELBERG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction

期刊

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

出版社

SPRINGER HEIDELBERG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文