4.6 Article

Certainty-based active learning for sampling imbalanced datasets

Journal

NEUROCOMPUTING
Volume 119, Issue -, Pages 350-358

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2013.03.023

Keywords

Active learning; Imbalanced data classification; Neighborhood exploration; Certainty-based neighborhood; Local classification behavior

Funding

  1. NSC, Taiwan, ROC [NSC 99-2221-8-194-023]

Ask authors/readers for more resources

Active learning is to learn an accurate classifier within as few queried labels as possible. For practical applications, we propose a Certainty-Based Active Learning (CBAL) algorithm to solve the imbalanced data classification problem in active learning. Without being affected by irrelevant samples which might overwhelm the minority class, the importance of each unlabeled sample is carefully measured within an explored neighborhood. For handling the agnostic case, IWAL-ERM is integrated into our approach without costs. Thus our CBAL is designed to determine the query probability within an explored neighborhood for each unlabeled sample. The potential neighborhood is incrementally explored, and there is no need to define the neighborhood size in advance. In our theoretical analysis, it is presented that CBAL has a polynomial label query improvement over passive learning. And the experimental results on synthetic and real-world datasets show that, CBAL has the ability of identifying informative samples and dealing with the imbalanced data classification problem in active learning. (C) 2013 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available