Journal
JOURNAL OF MACHINE LEARNING RESEARCH
Volume 23, Issue -, Pages -Publisher
MICROTOME PUBL
Keywords
privacy concern; similarity learning; unbiased classifier
Funding
- Australian Research Council [IC-190100031, DP-220102121, FT-220100318]
- RIKEN Collaborative Research Fund
- RGC Early Career Scheme [22200720]
- NSFC [62006202]
- Guangdong Basic and Applied Basic Research Foundation [2022A1515011652]
- Natural Science Foundation of China [62276242, CAAIXSJLJJ-2021-016B, CAAIXSJLJJ-2022-001A]
- Anhui Province Key Research and Development Program [202104a05020007]
- JST AIP Acceleration Research Grant [JPMJCR20U3]
- Institute for AI and Beyond, UTokyo
Ask authors/readers for more resources
SU classification is a method that uses similar data pairs and unlabeled data to build classifiers, providing an alternative to supervised classifiers that require labeled data points. However, SU classification has limitations due to the possibility of respondents answering questions in a favorable manner instead of truthfully. This paper studies how to learn from noisy similar data pairs and unlabeled data, proposing an algorithm for nSU classification.
SU classification employs similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points to build a classifier, which can serve as an alternative to the standard supervised trained classifiers requiring data points with class labels. SU classification is advanta-geous because in the era of big data, more attention has been paid to data privacy. Datasets with specific class labels are often difficult to obtain in real-world classification applications regarding privacy-sensitive matters, such as politics and religion, which can be a bottleneck in supervised classification. Fortunately, similarity labels do not reveal the explicit information and inherently protect the privacy, e.g., collecting answers to With whom do you share the same opinion on issue I? instead of What is your opinion on issue I?. Nevertheless, SU classification still has an obvious limitation: respondents might answer these questions in a manner that is viewed favorably by others instead of answering truthfully. Therefore, there exist some dissimilar data pairs labeled as similar, which significantly degenerates the performance of SU classification. In this paper, we study how to learn from noisy similar (nS) data pairs and unlabeled (U) data, which is called nSU classieurocation. Specifically, we carefully model the similarity noise and estimate the noise rate by using the mixture proportion estimation technique. Then, a clean classifier can be learned by minimizing a denoised and unbiased classification risk estimator, which only involves the noisy data. Moreover, we further derive a theoretical generalization error bound for the proposed method. Experimental results demonstrate the effectiveness of the proposed algorithm on several benchmark datasets.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available