4.6 Article

Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases

期刊

JOURNAL OF BIOMEDICAL INFORMATICS
卷 64, 期 -, 页码 1-9

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2016.09.009

关键词

Text mining; Biomedical named entity recognition; Machine learning; Stacked ensemble; Fuzzy matching

资金

  1. DRDO BU Centre for Life Sciences, Bharathiar University, Coimbatore-641046, Tamil Nadu, India

向作者/读者索取更多资源

Biomedical Named Entity Recognition (Bio-NER) is the crucial initial step in the information extraction process and a majorly focused research area in biomedical text mining. In the past years, several models and methodologies have been proposed for the recognition of semantic types related to gene, protein, chemical, drug and other biological relevant named entities. In this paper, we implemented a stacked ensemble approach combined with fuzzy matching for biomedical named entity recognition of disease names. The underlying concept of stacked generalization is to combine the outputs of base-level classifiers using a second-level meta-classifier in an ensemble. We used Conditional Random Field (CRF) as the underlying classification method that makes use of a diverse set of features, mostly based on domain specific, and are orthographic and morphologically relevant. In addition, we used fuzzy string matching to tag rare disease names from our in-house disease dictionary. For fuzzy matching, we incorporated two best fuzzy search algorithms Rabin Karp and Tuned Boyer Moore. Our proposed approach shows promised result of 94.66%, 89.12%, 84.10%, and 76.71% of F-measure while on evaluating training and testing set of both NCBI disease and BioCreative V CDR Corpora. (C) 2016 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据