4.6 Article

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding

Journal

FRONTIERS IN GENETICS
Volume 12, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2021.744334

Keywords

protein similarity; graph embedding; gene ontology; link prediction; DTW algorithm

Funding

  1. National Natural Science Foundation of China [61902430, 61873281, 61972226]

Ask authors/readers for more resources

The study systematically analyzes the performance of the GO graph and GOA graph in calculating the similarity of proteins using different graph embedding methods. It shows that graph embedding methods have advantages over traditional IC-based methods in calculating protein similarity, especially random walk graph embedding methods. Comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing protein similarity.
The study of protein-protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardized terms that describe gene products from molecular functions, biological processes, and cell components. Previous studies on assessing the similarity of GO terms were primarily based on Information Content (IC) between GO terms to measure the similarity of proteins. However, these methods tend to ignore the structural information between GO terms. Therefore, considering the structural information of GO terms, we systematically analyze the performance of the GO graph and GO Annotation (GOA) graph in calculating the similarity of proteins using different graph embedding methods. When applied to the actual Human and Yeast datasets, the feature vectors of GO terms and proteins are learned based on different graph embedding methods. To measure the similarity of the proteins annotated by different GO numbers, we used Dynamic Time Warping (DTW) and cosine to calculate protein similarity in GO graph and GOA graph, respectively. Link prediction experiments were then performed to evaluate the reliability of protein similarity networks constructed by different methods. It is shown that graph embedding methods have obvious advantages over the traditional IC-based methods. We found that random walk graph embedding methods, in particular, showed excellent performance in calculating the similarity of proteins. By comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing the similarity among proteins.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available