4.5 Article

Finding the Best Classification Threshold in Imbalanced Classification

期刊

BIG DATA RESEARCH
卷 5, 期 -, 页码 2-8

出版社

ELSEVIER
DOI: 10.1016/j.bdr.2015.12.001

关键词

Receiver Operating Characteristic (ROC); Protein remote homology detection; Imbalance data; F-score

资金

  1. Natural Science Foundation of China [61370010, 61303004]
  2. Natural Science Foundation of Fujian Province of China [2014J01253]
  3. Major Program of the National Social Science Foundation of China [13ZD148]

向作者/读者索取更多资源

Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced classification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via http://datamining.xmu.edu.cn/prht/orhttp://prht.sinaapp.com/. (C) 2016 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据