☆ 4.6 Article

A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts

JOURNAL OF BIOMEDICAL INFORMATICS (2012)

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Volume 45, Issue 3, Pages 471-481

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2012.01.002

Keywords

Semantic similarity; SNOMED-CT; Distributional semantics; Graph-based metrics; Ontologies

Funding

National Library of Medicine pre-doctoral fellowship [5T15LM007079-19]
NLM contract [HHSN276201000024C]
[R01 LM010027]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

An open research question when leveraging ontological knowledge is when to treat different concepts separately from each other and when to aggregate them. For instance, concepts for the terms paroxysmal cough and nocturnal cough might be aggregated in a kidney disease study, but should be left separate in a pneumonia study. Determining whether two concepts are similar enough to be aggregated can help build better datasets for data mining purposes and avoid signal dilution. Quantifying the similarity among concepts is a difficult task, however, in part because such similarity is context-dependent. We propose a comprehensive method, which computes a similarity score for a concept pair by combining data-driven and ontology-driven knowledge. We demonstrate our method on concepts from SNOMED-CT and on a corpus of clinical notes of patients with chronic kidney disease. By combining information from usage patterns in clinical notes and from ontological structure, the method can prune out concepts that are simply related from those which are semantically similar. When evaluated against a list of concept pairs annotated for similarity, our method reaches an AUC (area under the curve) of 92%.. (C) 2012 Elsevier Inc. All rights reserved.

A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper