4.8 Article

Instance-Dependent Positive and Unlabeled Learning With Labeling Bias Estimation

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2021.3061456

关键词

Instance-dependent PU learning; labeling bias; maximum likelihood estimation; solution uniqueness; generalization bound

资金

  1. National Science Foundation (NSF) of China [61973162, U1713208, 62006202]
  2. Fundamental Research Funds for the Central Universities [30920032202]
  3. CCF-Tencent Open Fund [RAGR20200101]
  4. Young Elite Scientists Sponsorship Program by CAST [2018QNRC001]
  5. Hong Kong Scholars Program [XJ2019036]
  6. 111 Program [B13022]
  7. RGC Early Career Scheme [22200720]
  8. HKBU CSD Departmental Incentive Grant
  9. Hong Kong Polytechnic University [YZ3K, UAJP/UAGK, ZVRH]
  10. ARC [DE190101473]
  11. Australian Research Council [DE190101473] Funding Source: Australian Research Council

向作者/读者索取更多资源

This paper proposes an instance-dependent Positive and Unlabeled (PU) classification algorithm based on graphical models and optimization techniques like EM and Adam, which effectively obtains the labeling probability of positive examples and the classifier based on observations. Theoretical analysis proves the existence of critical solutions and their local uniqueness, along with an upper bound estimation of generalization error for the algorithm. Empirical results demonstrate the significant advantage of the proposed method over existing PU approaches on various datasets.
This paper studies instance-dependent Positive and Unlabeled (PU) classification, where whether a positive example will be labeled (indicated by s) is not only related to the class label y, but also depends on the observation x. Therefore, the labeling probability on positive examples is not uniform as previous works assumed, but is biased to some simple or critical data points. To depict the above dependency relationship, a graphical model is built in this paper which further leads to a maximization problem on the induced likelihood function regarding P(s, y vertical bar x). By utilizing the well-known EM and Adam optimization techniques, the labeling probability of any positive example P(s = 1 vertical bar y = 1, x) as well as the classifier induced by P(y vertical bar x) can be acquired. Theoretically, we prove that the critical solution always exists, and is locally unique for linear model if some sufficient conditions are met. Moreover, we upper bound the generalization error for both linear logistic and non-linear network instantiations of our algorithm, with the convergence rate of expected risk to empirical risk as O(1/root k+1/root n-k+1/root n) (k and n are the sizes of positive set and the entire training set, respectively). Empirically, we compare our method with state-of-the-art instance-independent and instance-dependent PU algorithms on a wide range of synthetic, benchmark and real-world datasets, and the experimental results firmly demonstrate the advantage of the proposed method over the existing PU approaches.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据