4.6 Article

Deep Learning Enables Fast and Accurate Imputation of Gene Expression

期刊

FRONTIERS IN GENETICS
卷 12, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2021.624128

关键词

gene expression; transcriptomics; imputation; generative adversarial networks; machine learning; RNA-seq; GTEx; deep learning

资金

  1. la Caixa Foundation [100010434, LCF/BQ/EU19/11710059]
  2. National Institutes of Health [R35HG010718, R01HG011138, R01GM140287, R01HL133559]
  3. W. D. Armstrong Trust Fund, University of Cambridge, UK
  4. Engineering and Physical Sciences Research Council
  5. MICA: Mental Health Data Pathfinder: University of Cambridge, Cambridgeshire
  6. Peterborough NHS Foundation Trust
  7. Microsoft
  8. Medical Research Council [MC_PC_17213]

向作者/读者索取更多资源

Novel deep learning methods, PMI and GAIN-GTEx, were proposed for gene expression imputation, showing advantages in predictive performance and runtime over standard methods. PMI performed best in inductive imputation for protein-coding genes, while GAIN-GTEx outperformed in in-place imputation, indicating robust generalization on RNA-Seq data across cancer types.
A question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we propose two novel deep learning methods, PMI and GAIN-GTEx, for gene expression imputation. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We show that our approaches compare favorably to several standard and state-of-the-art imputation methods in terms of predictive performance and runtime in two case studies and two imputation scenarios. In comparison conducted on the protein-coding genes, PMI attains the highest performance in inductive imputation whereas GAIN-GTEx outperforms the other methods in in-place imputation. Furthermore, our results indicate strong generalization on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据