4.4 Article

An approach for classification of highly imbalanced data using weighting and undersampling

Journal

AMINO ACIDS
Volume 39, Issue 5, Pages 1385-1391

Publisher

SPRINGER WIEN
DOI: 10.1007/s00726-010-0595-2

Keywords

Imbalanced datasets; SVM; Undersampling technique

Funding

  1. Agency for Science, Technology, and Research, Singapore (A*Star) [052 101 0020]

Ask authors/readers for more resources

Real-world datasets commonly have issues with data imbalance. There are several approaches such as weighting, sub-sampling, and data modeling for handling these data. Learning in the presence of data imbalances presents a great challenge to machine learning. Techniques such as support-vector machines have excellent performance for balanced data, but may fail when applied to imbalanced datasets. In this paper, we propose a new undersampling technique for selecting instances from the majority class. The performance of this approach was evaluated in the context of several real biological imbalanced data. The ratios of negative to positive samples vary from similar to 9:1 to similar to 100:1. Useful classifiers have high sensitivity and specificity. Our results demonstrate that the proposed selection technique improves the sensitivity compared to weighted support-vector machine and available results in the literature for the same datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available