期刊
EXPERT SYSTEMS WITH APPLICATIONS
卷 38, 期 12, 页码 14760-14772出版社
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2011.05.004
关键词
Natural language processing; Named entity recognition; Maximum Entropy (ME); Conditional Random Field (CRF); Support Vector Machine (SVM); Multiobjective optimization (MOO); Simulated annealing (SA); Classifier ensemble; Weighted voting
In this paper, we propose a simulated annealing (SA) based multiobjective optimization (MOO) approach for classifier ensemble. Several different versions of the objective functions are exploited. We hypothesize that the reliability of prediction of each classifier differs among the various output classes. Thus, in an ensemble system, it is necessary to find out the appropriate weight of vote for each output class in each classifier. Diverse classification methods such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) are used to build different models depending upon the various representations of the available features. One most important characteristics of our system is that the features are selected and developed mostly without using any deep domain knowledge and/or language dependent resources. The proposed technique is evaluated for Named Entity Recognition (NER) in three resource-poor Indian languages, namely Bengali, Hindi and Telugu. Evaluation results yield the recall, precision and F-measure values of 93.95%, 95.15% and 94.55%, respectively for Bengali, 93.35%, 92.25% and 92.80%, respectively for Hindi and 84.02%, 96.56% and 89.85%, respectively for Telugu. Experiments also suggest that the classifier ensemble identified by the proposed MOO based approach optimizing the F-measure values of named entity (NE) boundary detection outperforms all the individual models, two conventional baseline models and three other MOO based ensembles. (C) 2011 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据