☆ 4.7 Article

Clustering large graphs via the Singular Value Decomposition

MACHINE LEARNING (2004)

期刊

MACHINE LEARNING

卷 56, 期 1-3, 页码 9-33

出版社

SPRINGER

DOI: 10.1023/B:MACH.0000033113.59016.96

关键词

Singular Value Decomposition; randomized algorithms; k-means clustering

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters ( usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. ( 2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m x n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right. Finally, we show that the SVD of a random submatrix-chosen according to a suitable probability distribution of a given matrix provides an approximation to the SVD of the whole matrix, thus yielding a very fast randomized algorithm. We expect this algorithm to be the main contribution of this paper, since it can be applied to problems of very large size which typically arise in modern applications.

Clustering large graphs via the Singular Value Decomposition

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Clustering large graphs via the Singular Value Decomposition

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文