4.7 Article

HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincare ball

Journal

BIOINFORMATICS
Volume 37, Issue 18, Pages 2971-2980

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab193

Keywords

-

Funding

  1. National Research Foundation of Korea - Korea government (MSIT) [NRF-2019R1A2C1006608]
  2. ITRC (Information Technology Research Center) support program [IITP-2020-2018-0-01431]
  3. National Institutes of Health [S10OD023495]

Ask authors/readers for more resources

The study proposes hierarchical representations of GO and genes using Poincare embedding, which can better capture the hierarchical structure and predict gene interactions compared to previous studies. HiG2Vec demonstrates superior performance in capturing the semantics of GO and genes and in data utilization, making it robust for various biological knowledge manipulation.
Motivation: Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincare balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. Results: In this article, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincare embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available