☆ 4.5 Article

Parallel selective sampling method for imbalanced and large data classification

PATTERN RECOGNITION LETTERS (2015)

期刊

PATTERN RECOGNITION LETTERS

卷 62, 期 -, 页码 61-67

出版社

ELSEVIER

DOI: 10.1016/j.patrec.2015.05.008

关键词

Imbalanced learning; Classification; Support vector machine; Selective sampling methods

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Several applications aim to identify rare events from very large data sets. Classification algorithms may present great limitations on large data sets and show a performance degradation due to class imbalance. Many solutions have been presented in literature to deal with the problem of huge amount of data or imbalancing separately. In this paper we assessed the performances of a novel method, Parallel Selective Sampling (PSS), able to select data from the majority class to reduce imbalance in large data sets. PSS was combined with the Support Vector Machine (SVM) classification. PSS-SVM showed excellent performances on synthetic data sets, much better than SVM. Moreover, we showed that on real data sets PSS-SVM classifiers had performances slightly better than those of SVM and RUSBoost classifiers with reduced processing times. In fact, the proposed strategy was conceived and designed for parallel and distributed computing. In conclusion, PSS-SVM is a valuable alternative to SVM and RUSBoost for the problem of classification by huge and imbalanced data, due to its accurate statistical predictions and low computational complexity. (C) 2015 The Authors. Published by Elsevier B.V.

Parallel selective sampling method for imbalanced and large data classification

期刊

PATTERN RECOGNITION LETTERS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Parallel selective sampling method for imbalanced and large data classification

期刊

PATTERN RECOGNITION LETTERS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文