☆ 4.4 Article

Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings

DATA & KNOWLEDGE ENGINEERING (2014)

期刊

DATA & KNOWLEDGE ENGINEERING

卷 94, 期 -, 页码 189-201

出版社

ELSEVIER

DOI: 10.1016/j.datak.2014.09.002

关键词

Medical subject headings; Multi-label classification; Output label associations; Reflective random indexing

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

National Center for Advancing Translational Sciences [UL1TR000117]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Trained indexers at the National Library of Medicine (NLM) manually tag each biomedical abstract with the most suitable terms from the Medical Subject Headings (MeSH) terminology to be indexed by their PubMed information system. MeSH has over 26,000 terms and indexers look at each article's full text while assigning the terms. Recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a new indexing approach that leverages term co-occurrence frequencies and latent term associations computed using MeSH term sets corresponding to a set of nearly 18 million articles already indexed with MeSH terms by indexers at NLM. The main goal of our study is to gauge the potential of output label co-occurrences, latent associations, and relationships extracted from free text in both unsupervised and supervised indexing approaches. In this paper, using a novel and purely unsupervised approach, we achieve a micro-F-score that is comparable to those obtained using supervised machine learning techniques. By incorporating term co-occurrence and latent association features into a supervised learning framework, we also improve over the best results published on two public datasets. (C) 2014 Elsevier B.V. All rights reserved.

Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings

期刊

DATA & KNOWLEDGE ENGINEERING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings

期刊

DATA & KNOWLEDGE ENGINEERING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文