☆ 3.8 Proceedings Paper

Calibrating Probability with Undersampling for Unbalanced Classification

2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI) (2015)

期刊

2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)

卷 -, 期 -, 页码 159-166

出版社

IEEE

DOI: 10.1109/SSCI.2015.33

关键词

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Undersampling is a popular technique for unbalanced datasets to reduce the skew in class distributions. However, it is well-known that undersampling one class modifies the priors of the training set and consequently biases the posterior probabilities of a classifier [9]. In this paper, we study analytically and experimentally how undersampling affects the posterior probability of a machine learning model. We formalize the problem of undersampling and explore the relationship between conditional probability in the presence and absence of undersampling. Although the bias due to undersampling does not affect the ranking order returned by the posterior probability, it significantly impacts the classification accuracy and probability calibration. We use Bayes Minimum Risk theory to find the correct classification threshold and show how to adjust it after undersampling. Experiments on several real-world unbalanced datasets validate our results.

Calibrating Probability with Undersampling for Unbalanced Classification

期刊

2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Calibrating Probability with Undersampling for Unbalanced Classification

期刊

2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI)

出版社

IEEE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文