4.6 Article

Identifying well-formed biomedical phrases in MEDLINE® text

期刊

JOURNAL OF BIOMEDICAL INFORMATICS
卷 45, 期 6, 页码 1035-1041

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2012.05.005

关键词

Machine learning; Imbalanced data; Biomedical phrases; Statistical phrase identification; Unified medical language system; Abbreviation full forms

资金

  1. Intramural Research Program of the NIH, National Library of Medicine

向作者/读者索取更多资源

In the modern world people frequently interact with retrieval systems to satisfy their information needs. Humanly understandable well-formed phrases represent a crucial interface between humans and the web, and the ability to index and search with such phrases is beneficial for human-web interactions. In this paper we consider the problem of identifying humanly understandable, well formed, and high quality biomedical phrases in MEDLINE documents. The main approaches used previously for detecting such phrases are syntactic, statistical, and a hybrid approach combining these two. In this paper we propose a supervised learning approach for identifying high quality phrases. First we obtain a set of known well-formed useful phrases from an existing source and label these phrases as positive. We then extract from MEDLINE a large set of multiword strings that do not contain stop words or punctuation. We believe this unlabeled set contains many well-formed phrases. Our goal is to identify these additional high quality phrases. We examine various feature combinations and several machine learning strategies designed to solve this problem. A proper choice of machine learning methods and features identifies in the large collection strings that are likely to be high quality phrases. We evaluate our approach by making human judgments on multiword strings extracted from MEDLINE using our methods. We find that over 85% of such extracted phrase candidates are humanly judged to be of high quality. Published by Elsevier Inc.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据