4.6 Article

Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms

期刊

NEUROCOMPUTING
卷 152, 期 -, 页码 429-443

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2014.10.007

关键词

Highly imbalanced data with overlapped region; Soft-hybrid algorithms; Responsive area mapping classifications

资金

  1. Office of the Higher Education Commission, Ministry of Education (Thailand), as part of the Program Strategic Scholarships for Frontier Research Network
  2. Research Grant of Burapha University through National Research Council of Thailand [80/2556]

向作者/读者索取更多资源

A new aspect of imbalanced data classification was studied. Unlike the classical imbalanced data classification where the cause of problem is due to the difference of data sizes, our study concerns only the situation when there exists an overlap between two classes. When one class overlaps another class, there are three regions induced from the overlap. The first region is the overlapped region between two classes. The rest is the non-overlapped region of each class. The imbalance situation is obviously caused by the different amount of data at the overlapped region and non-overlapped region. In this situation, the difference of data sizes from different classes is not the main concern and has no effect on the accuracy of classification. In this research, a combined technique, called Soft-Hybrid algorithm, was proposed for improving classification performance. The technique was divided into two main phases: boundary region determination and responsive classification algorithms for each sub-area. In the first phase, data were grouped as (1) non-overlapping data, (2) borderline data, and (3) overlapping data. Learning data using modified Hausdorff Distance, Radial Basis Function Network and K-Means clustering technique with Mahalanobis Distance. Then, modified Kernel Learning Method, modified DBSCAN and RBF network were applied to classify the data into proper groups based on statistical values from the classification phase. Finally, the results of all techniques were combined. The experimental results illustrated that the proposed method can significantly improve the effectiveness in classifying imbalanced data having large overlapping sections based on TP rate, F-measure and G-mean measures. Moreover, the computational times of the proposed method were lower than the standard algorithms used for this type of this problem. (C) 2014 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据