Journal
2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS
Volume -, Issue -, Pages 1014-1018Publisher
IEEE COMPUTER SOC
Keywords
-
Categories
Funding
- NSF [1447822]
- Direct For Computer & Info Scie & Enginr
- Div Of Information & Intelligent Systems [1447822] Funding Source: National Science Foundation
Ask authors/readers for more resources
Given data that lies in a union of low-dimensional subspaces, the problem of subspace clustering aims to learn-in an unsupervised manner-the membership of the data to their respective subspaces. State-of-the-art subspace clustering methods typically adopt a two-step procedure. In the first step, an affinity measure among data points is constructed, usually by exploiting some form of data self-representation. In the second step, spectral clustering is applied to the affinity measure to find the membership of the data to their respective subspaces. While such methods are broadly applicable to mid-size datasets with 10,000 data points in 10,000 variables, they cannot be directly applied to large-scale datasets. This paper proposes a divide-and-conquer framework for large-scale subspace clustering. The data is first divided into chunks and subspace clustering is applied to each chunk. After removing potential outliers from each cluster, a new cross-representation measure for the similarity between subspaces is used to merge clusters from different chunks that correspond to the same subspace. A self-representation method is then used to assign outliers to clusters. We evaluate the proposed strategy on synthetic large-scale dataset with 1,000,000 data points, as well as on the MNIST database, which contains 70,000 images of handwritten digits. The numerical results highlight the scalability of our approach.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available