4.7 Article

An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites

期刊

BIOINFORMATICS
卷 38, 期 18, 页码 4271-4277

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btac532

关键词

-

资金

  1. National Natural Science Foundation of China [12101480]
  2. Natural Science Basic Research Program of Shaanxi [2021JM-115]
  3. Fundamental Research Funds for the Central Universities [JB210715]

向作者/读者索取更多资源

This study develops a deep learning model to identify m5C sites in RNA sequences. The model utilizes sequence features extracted from the RNA sequences and incorporates them into an improved residual network for classification. Experimental results show a considerable improvement in accuracy compared to previous studies, demonstrating the robust performance of the model.
Motivation: 5-Methylcytosine (m5C) is a crucial post-transcriptional modification. With the development of technology, it is widely found in various RNAs. Numerous studies have indicated that m5C plays an essential role in various activities of organisms, such as tRNA recognition, stabilization of RNA structure, RNA metabolism and so on. Traditional identification is costly and time-consuming by wet biological experiments. Therefore, computational models are commonly used to identify the m5C sites. Due to the vast computing advantages of deep learning, it is feasible to construct the predictive model through deep learning algorithms. Results: In this study, we construct a model to identify m5C based on a deep fusion approach with an improved residual network. First, sequence features are extracted from the RNA sequences using Kmer, K-tuple nucleotide frequency component (KNFC), Pseudo dinucleotide composition (PseDNC) and Physical and chemical property (PCP). Kmer and KNFC extract information from a statistical point of view. PseDNC and PCP extract information from the physicochemical properties of RNA sequences. Then, two parts of information are fused with new features using bidirectional long- and short-term memory and attention mechanisms, respectively. Immediately after, the fused features are fed into the improved residual network for classification. Finally, 10-fold cross-validation and independent set testing are used to verify the credibility of the model. The results show that the accuracy reaches 91.87%, 95.55%, 92.27% and 95.60% on the training sets and independent test sets of Arabidopsis thaliana and M.musculus, respectively. This is a considerable improvement compared to previous studies and demonstrates the robust performance of our model.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据