☆ 4.4 Article

Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish

BMC MEDICAL INFORMATICS AND DECISION MAKING (2021)

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

卷 21, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s12911-021-01495-w

关键词

Natural language processing; Vocabulary; Medical sub-language; Clinical specialty; Medical sub-domain

类别

Medical Informatics

资金

LIVING-LANG project of the Spanish Government [RTI2018-094653-B-C21]
Alianza CAOBA Colombia [671-2019]
Fondo Europeo de Desarrollo Regional (FEDER)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a method to automatically extract and classify Spanish medical terms weighted per sub-domain from MEDLINE titles and abstracts, aiming to reduce the ambiguity and improve NLP tasks in specific clinical specialties. The results demonstrate that the specialized term set SCOVACLIS helps to enhance text classification problems and achieves improvements in F-measure by 6 percentage points compared to a baseline. This study supports the hypothesis that specific term sets can reduce ambiguity compared to a general vocabulary, and highlights the importance of domain-specific resources in biomedical NLP tasks.

Background: Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results: This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion: The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.

Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文