4.7 Article Proceedings Paper

Gene2vec: distributed representation of genes based on co-expression

期刊

BMC GENOMICS
卷 20, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s12864-018-5370-x

关键词

Distributed representation; Gene2Vec; Gene co-expression; Embedding; Word2vec; Gene-gene interaction

资金

  1. Cancer Prevention Research Institute of Texas (CPRIT) [RP160015]

向作者/读者索取更多资源

BackgroundExisting functional description of genes are categorical, discrete, and mostly through manual process. In this work, we explore the idea of gene embedding, distributed representation of genes, in the spirit of word embedding.ResultsFrom a pure data-driven fashion, we trained a 200-dimension vector representation of all human genes, using gene co-expression patterns in 984 data sets from the GEO databases. These vectors capture functional relatedness of genes in terms of recovering known pathways - the average inner product (similarity) of genes within a pathway is 1.52X greater than that of random genes. Using t-SNE, we produced a gene co-expression map that shows local concentrations of tissue specific genes. We also illustrated the usefulness of the embedded gene vectors, laden with rich information on gene co-expression patterns, in tasks such as gene-gene interaction prediction.ConclusionsWe proposed a machine learning method that utilizes transcriptome-wide gene co-expression to generate a distributed representation of genes. We further demonstrated the utility of our distribution by predicting gene-gene interaction based solely on gene names. The distributed representation of genes could be useful for more bioinformatics applications.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据