4.0 Article

Identifying Protein Subcellular Location with Embedding Features Learned from Networks

Journal

CURRENT PROTEOMICS
Volume 18, Issue 5, Pages 646-660

Publisher

BENTHAM SCIENCE PUBL LTD
DOI: 10.2174/1570164617999201124142950

Keywords

Protein subcellular location prediction; network embedding algorithm; deepWalk; Node2vec; mashup; machine learning algorithm; support vector machine; random forest

Funding

  1. THE Key-Area Research and Development Program of Guangdong Province [2018B020203003]
  2. Guangzhou science and technology planning project [201707020007]
  3. Science and Technology Planning Project of Guangdong Province, China [2017A010405039]

Ask authors/readers for more resources

This study analyzed features generated by three network embedding algorithms on multiple protein networks and found that features produced by Mashup algorithm on multiple networks were highly informative for predicting protein subcellular location, resulting in superior models compared to some classic ones.
Background: Identification of protein subcellular location is an important problem be-cause the subcellular location is highly related to protein function. It is fundamental to determine the locations with biology experiments. However, these experiments are of high costs and time-con-suming. The alternative way to address such a problem is to design effective computational meth-ods. Objective: To date, several computational methods have been proposed in this regard. However, th-ese methods mainly adopted the features derived from the proteins themselves. On the other hand, with the development of the network technique, several embedding algorithms have been pro-posed, which can encode nodes in the network into feature vectors. Such algorithms connected the network and traditional classification algorithms. Thus, they provided a new way to construct mod -els for the prediction of protein subcellular location. Methods: In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and Mashup) that were applied on one or multiple protein networks. Ob-tained features were learned by one machine learning algorithm (support vector machine or ran-dom forest) to construct the model. The cross-validation method was adopted to evaluate all con-structed models. Results: After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks were quite informative for predicting protein subcellular location. The model based on these features were superior to some classic models. Conclusion: Embedding features yielded by a proper and powerful network embedding algorithm were effective for building the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available