4.7 Article

Learning from Noisy Pairwise Similarity and Unlabeled Data

Journal

JOURNAL OF MACHINE LEARNING RESEARCH
Volume 23, Issue -, Pages -

Publisher

MICROTOME PUBL

Keywords

privacy concern; similarity learning; unbiased classifier

Funding

  1. Australian Research Council [IC-190100031, DP-220102121, FT-220100318]
  2. RIKEN Collaborative Research Fund
  3. RGC Early Career Scheme [22200720]
  4. NSFC [62006202]
  5. Guangdong Basic and Applied Basic Research Foundation [2022A1515011652]
  6. Natural Science Foundation of China [62276242, CAAIXSJLJJ-2021-016B, CAAIXSJLJJ-2022-001A]
  7. Anhui Province Key Research and Development Program [202104a05020007]
  8. JST AIP Acceleration Research Grant [JPMJCR20U3]
  9. Institute for AI and Beyond, UTokyo

Ask authors/readers for more resources

SU classification is a method that uses similar data pairs and unlabeled data to build classifiers, providing an alternative to supervised classifiers that require labeled data points. However, SU classification has limitations due to the possibility of respondents answering questions in a favorable manner instead of truthfully. This paper studies how to learn from noisy similar data pairs and unlabeled data, proposing an algorithm for nSU classification.
SU classification employs similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points to build a classifier, which can serve as an alternative to the standard supervised trained classifiers requiring data points with class labels. SU classification is advanta-geous because in the era of big data, more attention has been paid to data privacy. Datasets with specific class labels are often difficult to obtain in real-world classification applications regarding privacy-sensitive matters, such as politics and religion, which can be a bottleneck in supervised classification. Fortunately, similarity labels do not reveal the explicit information and inherently protect the privacy, e.g., collecting answers to With whom do you share the same opinion on issue I? instead of What is your opinion on issue I?. Nevertheless, SU classification still has an obvious limitation: respondents might answer these questions in a manner that is viewed favorably by others instead of answering truthfully. Therefore, there exist some dissimilar data pairs labeled as similar, which significantly degenerates the performance of SU classification. In this paper, we study how to learn from noisy similar (nS) data pairs and unlabeled (U) data, which is called nSU classieurocation. Specifically, we carefully model the similarity noise and estimate the noise rate by using the mixture proportion estimation technique. Then, a clean classifier can be learned by minimizing a denoised and unbiased classification risk estimator, which only involves the noisy data. Moreover, we further derive a theoretical generalization error bound for the proposed method. Experimental results demonstrate the effectiveness of the proposed algorithm on several benchmark datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available