☆ 4.6 Article

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding

FRONTIERS IN GENETICS (2021)

Journal

FRONTIERS IN GENETICS

Volume 12, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA

DOI: 10.3389/fgene.2021.744334

Keywords

protein similarity; graph embedding; gene ontology; link prediction; DTW algorithm

Funding

National Natural Science Foundation of China [61902430, 61873281, 61972226]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study systematically analyzes the performance of the GO graph and GOA graph in calculating the similarity of proteins using different graph embedding methods. It shows that graph embedding methods have advantages over traditional IC-based methods in calculating protein similarity, especially random walk graph embedding methods. Comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing protein similarity.

The study of protein-protein interaction and the determination of protein functions are important parts of proteomics. Computational methods are used to study the similarity between proteins based on Gene Ontology (GO) to explore their functions and possible interactions. GO is a series of standardized terms that describe gene products from molecular functions, biological processes, and cell components. Previous studies on assessing the similarity of GO terms were primarily based on Information Content (IC) between GO terms to measure the similarity of proteins. However, these methods tend to ignore the structural information between GO terms. Therefore, considering the structural information of GO terms, we systematically analyze the performance of the GO graph and GO Annotation (GOA) graph in calculating the similarity of proteins using different graph embedding methods. When applied to the actual Human and Yeast datasets, the feature vectors of GO terms and proteins are learned based on different graph embedding methods. To measure the similarity of the proteins annotated by different GO numbers, we used Dynamic Time Warping (DTW) and cosine to calculate protein similarity in GO graph and GOA graph, respectively. Link prediction experiments were then performed to evaluate the reliability of protein similarity networks constructed by different methods. It is shown that graph embedding methods have obvious advantages over the traditional IC-based methods. We found that random walk graph embedding methods, in particular, showed excellent performance in calculating the similarity of proteins. By comparing link prediction experiment results from GO(DTW) and GOA(cosine) methods, it is shown that GO(DTW) features provide highly effective information for analyzing the similarity among proteins.

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding

Journal

FRONTIERS IN GENETICS

Publisher

FRONTIERS MEDIA SA

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Comparative Analysis of Unsupervised Protein Similarity Prediction Based on Graph Embedding

Journal

FRONTIERS IN GENETICS

Publisher

FRONTIERS MEDIA SA

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper