4.6 Article

Predicting Sites of Epitranscriptome Modifications Using Unsupervised Representation Learning Based on Generative Adversarial Networks

期刊

FRONTIERS IN PHYSICS
卷 8, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fphy.2020.00196

关键词

N-6-methyladenosine (m(6)A); epitranscriptome; RNA modification site prediction; generative adversarial networks (GANs); unsupervised representation learning; methylated RNA immunoprecipitation sequencing (MeRIP-Seq)

资金

  1. National Institutes of Health [R01GM113245, CTSA 1UL1RR025767-01, K99CA248944]
  2. Cancer Prevention and Research Institute of Texas [RP190346, RP160732]
  3. San Antonio Life Sciences Institute (SALSI Innovation Challenge Award 2016)
  4. San Antonio Life Sciences Institute (SALSI Post-doctoral Research Fellowship 2018)
  5. Fund for Innovation in Cancer Informatics (ICI Fund)

向作者/读者索取更多资源

Epitranscriptome is an exciting area that studies different types of modifications in transcripts, and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN)-based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low-dimensional embedding of transcriptomic sequences. MR-GAN was then applied to extract embeddings of the sequences in a training dataset we created for nine epitranscriptome modifications, namely, m(6)A, m(1)A, m(1)G, m(2)G, m(5)C, m(5)U, 2 '-O-Me, pseudouridine (psi), and dihydrouridine (D), of which the positive samples are very limited. Prediction models were trained based on the embeddings extracted by MR-GAN. We compared the prediction performance with the one-hot encoding of the training sequences and SRAMP, a state-of-the-art m(6)A site prediction algorithm, and demonstrated that the learned embeddings outperform one-hot encoding by a significant margin for up to 15% improvement. Using MR-GAN, we also investigated the sequence motifs for each modification type and uncovered known motifs as well as new motifs not possible with sequences directly. The results demonstrated that transcriptome features extracted using unsupervised learning could lead to high precision for predicting multiple types of epitranscriptome modifications, even when the data size is small and extremely imbalanced.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据