☆ 4.8 Article

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Volume 43, Issue 3, Pages 918-932

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2019.2941684

Keywords

PU learning; loss decomposition; centroid estimation; kernel extension; generalization bound

Funding

NSF of China [61602246, 61973162, U1713208]
NSF of Jiangsu Province [BK20171430]
Fundamental Research Funds for the Central Universities [30918011319]
open project of State Key Laboratory of Integrated Services Networks (Xidian University) [ISN19-03]
Summit of the Six Top Talents Program [DZXX-027]
Young Elite Scientists Sponsorship Program by Jiangsu Province
Young Elite Scientists Sponsorship Program by CAST [2018QNRC001]
Program for Changjiang Scholars
111 Program [AH92005]
ARC [FL-170100117, DP180103424, DE190101473]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper studies PU learning and proposes the LDCE and KLDCE algorithms, which address the issue of one-side label noise and employ kernel trick to learn from unlabeled data, achieving impressive performance in experiments.

This paper studies Positive and Unlabeled learning (PU learning), of which the target is to build a binary classifier where only positive data and unlabeled data are available for classifier training. To deal with the absence of negative training data, we first regard all unlabeled data as negative examples with false negative labels, and then convert PU learning into the risk minimization problem in the presence of such one-side label noise. Specifically, we propose a novel PU learning algorithm dubbed Loss Decomposition and Centroid Estimation (LDCE). By decomposing the loss function of corrupted negative examples into two parts, we show that only the second part is affected by the noisy labels. Thereby, we may estimate the centroid of corrupted negative set via an unbiased way to reduce the adverse impact of such label noise. Furthermore, we propose the Kernelized LDCE (KLDCE) by introducing the kernel trick, and show that KLDCE can be easily solved by combining Alternative Convex Search (ACS) and Sequential Minimal Optimization (SMO). Theoretically, we derive the generalization error bound which suggests that the generalization risk of our model converges to the empirical risk with the order of O(1 root k+1/root n-k+1 root n)(n and k are the amounts of training data and positive data correspondingly). Experimentally, we conduct intensive experiments on synthetic dataset, UCI benchmark datasets and real-world datasets, and the results demonstrate that our approaches (LDCE and KLDCE) achieve the top-level performance when compared with both classic and state-of-the-art PU learning methods.

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Loss Decomposition and Centroid Estimation for Positive and Unlabeled Learning

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper