4.6 Article

Identification of Protein Subcellular Localization With Network and Functional Embeddings

期刊

FRONTIERS IN GENETICS
卷 11, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2020.626500

关键词

protein subcellular localization; network embedding; functional embedding; gene ontology; KEGG pathway

资金

  1. Strategic Priority Research Program of Chinese Academy of Sciences [XDB38050200]
  2. National Key R&D Program of China [2018YFC0910403, 2017YFC1201200]
  3. Shanghai Municipal Science and Technology Major Project [2017SHZDZX01]
  4. National Natural Science Foundation of China [31701151]
  5. Shanghai Sailing Program [16YF1413800]
  6. Youth Innovation Promotion Association of Chinese Academy of Sciences (CAS) [2016245]
  7. Fund of the Key Laboratory of Tissue Microenvironment and Tumor of Chinese Academy of Sciences [202002]

向作者/读者索取更多资源

This study introduces an embedding-based method for predicting the subcellular localization of proteins by learning functional and network embeddings. The combined embeddings result in a novel representation of protein locations, leading to a final classification model with superior performance compared to conventional methods, as demonstrated in a benchmark dataset with 4,861 proteins from 16 locations.
The functions of proteins are mainly determined by their subcellular localizations in cells. Currently, many computational methods for predicting the subcellular localization of proteins have been proposed. However, these methods require further improvement, especially when used in protein representations. In this study, we present an embedding-based method for predicting the subcellular localization of proteins. We first learn the functional embeddings of KEGG/GO terms, which are further used in representing proteins. Then, we characterize the network embeddings of proteins on a protein-protein network. The functional and network embeddings are combined as novel representations of protein locations for the construction of the final classification model. In our collected benchmark dataset with 4,861 proteins from 16 locations, the best model shows a Matthews correlation coefficient of 0.872 and is thus superior to multiple conventional methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据