4.7 Article

OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction

期刊

BIOINFORMATICS
卷 35, 期 12, 页码 2133-2140

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty933

关键词

-

资金

  1. King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [FCC/1/1976-06, URF/1/2602-01, URF/1/3007-01, URF/1/3412-01, URF/1/3450-01, URF/1/3454-01]

向作者/读者索取更多资源

Motivation Ontologies are widely used in biology for data annotation, integration and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such as semantic similarity measures. Results We propose a novel method, OPA2Vec, to generate vector representations of biological entities in ontologies by combining formal ontology axioms and annotation axioms from the ontology meta-data. We apply a Word2Vec model that has been pre-trained on either a corpus or abstracts or full-text articles to produce feature vectors from our collected data. We validate our method in two different ways: first, we use the obtained vector representations of proteins in a similarity measure to predict protein-protein interaction on two different datasets. Second, we evaluate our method on predicting gene-disease associations based on phenotype similarity by generating vector representations of genes and diseases using a phenotype ontology, and applying the obtained vectors to predict gene-disease associations using mouse model phenotypes. We demonstrate that OPA2Vec significantly outperforms existing methods for predicting gene-disease associations. Using evidence from mouse models, we apply OPA2Vec to identify candidate genes for several thousand rare and orphan diseases. OPA2Vec can be used to produce vector representations of any biomedical entity given any type of biomedical ontology. Availability and implementation https://github.com/bio-ontology-research-group/opa2vec Supplementary information Supplementary data are available at Bioinformatics online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据