☆ 4.5 Article

Adapting SVM for data sparseness and imbalance: a case study in information extraction

NATURAL LANGUAGE ENGINEERING (2009)

期刊

NATURAL LANGUAGE ENGINEERING

卷 15, 期 -, 页码 241-271

出版社

CAMBRIDGE UNIV PRESS

DOI: 10.1017/S1351324908004968

关键词

类别

Computer Science, Artificial Intelligence Linguistics Language & Linguistics

资金

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Support Vector Machines (SVM) have been used successfully in many Natural Language Processing (NLP) tasks. The novel contribution of this paper is in investigating two techniques for making SVM more suitable for language learning tasks. Firstly, we propose an SVM with uneven margins (SVMUM) model to deal with the problem of imbalanced training data. Secondly. SVM active learning is employed in order to alleviate the difficulty in obtaining labelled training data. The algorithms are presented and evaluated on several Information Extraction (IE) tasks, where they achieved better performance than the standard SVM and the SVM with passive learning, respectively. Moreover, by combining SVMUM with the active learning algorithm, we achieve the best reported results on the seminars and jobs corpora, which are benchmark data sets used for evaluation and comparison of machine learning algorithms for IE. In addition, we also evaluate the token based classification framework for IE with three different entity tagging schemes. In comparison to previous methods dealing with the same problems, our methods are both effective and efficient, which are valuable features for real-world applications. Due to the similarity in the formulation of the learning problem for IE and for other NLP tasks, the two techniques are likely to be beneficial in a wide range of applications(1).

Adapting SVM for data sparseness and imbalance: a case study in information extraction

期刊

NATURAL LANGUAGE ENGINEERING

出版社

CAMBRIDGE UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Adapting SVM for data sparseness and imbalance: a case study in information extraction

期刊

NATURAL LANGUAGE ENGINEERING

出版社

CAMBRIDGE UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文