4.6 Article

TLUSBoost algorithm: a boosting solution for class imbalance problem

Journal

SOFT COMPUTING
Volume 23, Issue 21, Pages 10755-10767

Publisher

SPRINGER
DOI: 10.1007/s00500-018-3629-4

Keywords

Undersampling; Boosting; Data mining; Class imbalance problem; Tomek-link pair

Ask authors/readers for more resources

It is habitually assumed that the training sets used for learning are balanced. However, this hypothesis is not always true in real-world applications, and hence, there is a tendency of relying on the classification models that are biased towards the overrepresented class as traditional datamining algorithms are generally inclined towards building of suboptimal classification models. This class imbalance problem is common to many application domains such as data mining, machine learning, pattern recognition, etc. Several techniques have been proposed to alleviate the problem of class imbalance. RUSBoost is one of the ensemble learning approaches that uses random undersampling (RUS) for data resampling and AdaBoost technique for boosting, as a solution to class imbalance. However, RUS may cause the loss of significant information of dataset. Therefore, this paper proposes Tomek-link undersampling-based boosting (TLUSBoost) algorithm which uses Tomek-linked and redundancy-based undersampling (TLRUS) for data resampling and AdaBoost technique for boosting. TLRUS meticulously finds outliers using Tomek-link concept and then eliminates some of the probable redundant instances from the outliers. Hence, this algorithm reduces the loss of information and conserves the characteristics of the dataset, thereby helping the classifier to be trained appropriately. TLUSBoost method is validated with 16 benchmark datasets and compared with EasyEnsemble, BalanceCascade, SMOTEBoost and RUSBoost algorithms. Ten-fold cross-validation is applied to measure overall accuracy and F-measure metric of the models. Experimental results show that the proposed model is better than EasyEnsemble, BalanceCascade, SMOTEBoost and RUSBoost in both overall accuracy and F-measure performance metric.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available