4.8 Article

Self-Representation Based Unsupervised Exemplar Selection in a Union of Subspaces

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2020.3035599

Keywords

Unsupervised exemplar selection; imbalanced data; large-scale data; subspace clustering

Funding

  1. Northrop Grumman Research in Applications for Learning Machines (REALM) Initiative
  2. [NSF 1618637]
  3. [IARPA 127228]
  4. [NSF 1934931]

Ask authors/readers for more resources

This paper introduces a new exemplar selection model that chooses representative samples by reconstructing and covering the data points. By introducing a farthest first search algorithm, this method can efficiently select samples that meet the criteria. In addition, we also develop an efficient and robust subspace clustering method for imbalanced data.
Finding a small set of representatives from an unlabeled dataset is a core problem in a broad range of applications such as dataset summarization and information extraction. Classical exemplar selection methods such as k-medoids work under the assumption that the data points are close to a few cluster centroids, and cannot handle the case where data lie close to a union of subspaces. This paper proposes a new exemplar selection model that searches for a subset that best reconstructs all data points as measured by the l(1) norm of the representation coefficients. Geometrically, this subset best covers all the data points as measured by the Minkowski functional of the subset. To solve our model efficiently, we introduce a farthest first search algorithm that iteratively selects the worst represented point as an exemplar. When the dataset is drawn from a union of independent subspaces, our method is able to select sufficiently many representatives from each subspace. We further develop an exemplar based subspace clustering method that is robust to imbalanced data and efficient for large scale data. Moreover, we show that a classifier trained on the selected exemplars (when they are labeled) can correctly classify the rest of the data points.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available