☆ 4.6 Article

Identifying named entities from PubMed® for enriching semantic categories

BMC BIOINFORMATICS (2015)

期刊

BMC BIOINFORMATICS

卷 16, 期 -, 页码 -

出版社

BMC

DOI: 10.1186/s12859-015-0487-2

关键词

Semantic term extraction; Natural language processing; Machine learning

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Mathematical & Computational Biology

资金

NIH, National Library of Medicine

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: Controlled vocabularies such as the Unified Medical Language System (UMLS (R)) and Medical Subject Headings (MeSH (R)) are widely used for biomedical natural language processing (NLP) tasks. However, the standard terminology in such collections suffers from low usage in biomedical literature, e.g. only 13% of UMLS terms appear in MEDLINE (R). Results: We here propose an efficient and effective method for extracting noun phrases for biomedical semantic categories. The proposed approach utilizes simple linguistic patterns to select candidate noun phrases based on headwords, and a machine learning classifier is used to filter out noisy phrases. For experiments, three NLP rules were tested and manually evaluated by three annotators. Our approaches showed over 93% precision on average for the headwords, gene, protein, disease, cell and cells. Conclusions: Although biomedical terms in knowledge-rich resources may define semantic categories, variations of the controlled terms in literature are still difficult to identify. The method proposed here is an effort to narrow the gap between controlled vocabularies and the entities used in text. Our extraction method cannot completely eliminate manual evaluation, however a simple and automated solution with high precision performance provides a convenient way for enriching semantic categories by incorporating terms obtained from the literature.

Identifying named entities from PubMed® for enriching semantic categories

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Identifying named entities from PubMed® for enriching semantic categories

期刊

BMC BIOINFORMATICS

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文