☆ 4.5 Article

A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2020)

Journal

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

Volume -, Issue -, Pages -

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/database/baaa078

Keywords

Funding

Portuguese FCT through the LASIGE Research Unit [UIDB/00408/2020, UIDP/00408/2020, PTDC/EEI-ESS/4633/2014, SFRH/BD/145377/2019]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The ability to compare entities within a knowledge graph is a cornerstone technique for several applications, ranging from the integration of heterogeneous data to machine learning. It is of particular importance in the biomedical domain, where semantic similarity can be applied to the prediction of protein-protein interactions, associations between diseases and genes, cellular localization of proteins, among others. In recent years, several knowledge graph-based semantic similarity measures have been developed, but building a gold standard data set to support their evaluation is nontrivial. We present a collection of 21 benchmark data sets that aim at circumventing the difficulties in building benchmarks for large biomedical knowledge graphs by exploiting proxies for biomedical entity similarity. These data sets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, and explore proxy similarities calculated based on protein sequence similarity, protein family similarity, protein-protein interactions and phenotype-based gene similarity. Data sets have varying sizes and cover four different species at different levels of annotation completion. For each data set, we also provide semantic similarity computations with state-of-the-art representative measures.

A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain

Journal

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain

Journal

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper