4.7 Article

Cost-sensitive active learning through statistical methods

期刊

INFORMATION SCIENCES
卷 501, 期 -, 页码 460-482

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2019.06.015

关键词

Active learning; Clustering; Cost-sensitive; Three-way decision

资金

  1. Natural Science Foundation of Sichuan Province [2017JY0190]
  2. Scientific Innovation Group for Youths of Sichuan Province [2019JDTD0017]
  3. State Administration of Work Safety project [Sichuan-0008-2016AQ Sichuan-0009-2016AQ]
  4. Ministry of Education Innovation Project [201801140013, 201801006094]
  5. National Natural Science Foundation of China [61876157, 71571148]

向作者/读者索取更多资源

Clustering-based active learning splits the data into a number of blocks and queries the labels of most the representative instances. When the cost of labeling and misclassification are considered, we also face a key issue: How many labels should be queried for a given block. In this paper, we present theoretical and practical statistical methods to handle this issue. The theoretical statistical method calculates the optimal number of query labels for a predefined label distribution. Considering label distributions for different clustering qualities, we obtain three hypothetical models, namely Gaussian, Uniform, and V models. The practical statistical method calculates empirical label distribution of the cluster blocks. Considering four popular clustering algorithms, we use symmetry and curve fitting techniques on 30 datasets to obtain empirical distributions. Inspired by three-way decision, we design an algorithm called the cost-sensitive active learning through statistical methods (CATS). Experiments were performed on 12 binary-class datasets for both the distribution evaluation and learning task. The results of significance tests verify the effectiveness of CATS and its superior performance with respect to state-of-the-art cost-sensitive active learning algorithms. (C) 2019 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据