4.4 Article

Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition

Journal

SIAM JOURNAL ON COMPUTING
Volume 36, Issue 1, Pages 184-206

Publisher

SIAM PUBLICATIONS
DOI: 10.1137/S0097539704442702

Keywords

randomized algorithms; Monte Carlo methods; massive data sets; CUR matrix decomposition

Ask authors/readers for more resources

In many applications, the data consist of ( or may be naturally formulated as) an m x n matrix A which may be stored on disk but which is too large to be read into random access memory ( RAM) or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an m x n matrix A, compute approximations to A which are the product of three smaller matrices, C, U, and R, each of which may be computed rapidly. Let A' = CUR be the computed approximate decomposition; both algorithms have provable bounds for the error matrix A - A'. In the first algorithm, c columns of A and r rows of A are randomly chosen. If the m x c matrix C consists of those c columns of A ( after appropriate rescaling) and the r x n matrix R consists of those r rows of A ( also after appropriate rescaling), then the c x r matrix U may be calculated from C and R. For any matrix X, let parallel to X parallel to(F) and parallel to X parallel to(2) denote its Frobenius norm and its spectral norm, respectively. It is proven that parallel to A - A'parallel to(xi) <= min (D: rank( D) <= k) parallel to A - D parallel to(xi) + poly(k, 1/c)parallel to A parallel to(F) holds in expectation and with high probability for both xi = 2, F and for all k = 1,..., rank(A); thus by appropriate choice of k parallel to A - A'parallel to(2) <=epsilon parallel to A parallel to(F) also holds in expectation and with high probability. This algorithm may be implemented without storing the matrix A in RAM, provided it can make two passes over the matrix stored in external memory and use O( m + n) additional RAM ( assuming that c and r are constants, independent of the size of the input). The second algorithm is similar except that it approximates the matrix C by randomly sampling a constant number of rows of C. Thus, it has additional error but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error ( beyond the best rank-k approximation) that is at most epsilon parallel to A parallel to(F), both algorithms take time which is a low-degree polynomial in k, 1/epsilon, and 1/delta, where delta > 0 is a failure probability; the. rst takes time linear in max(m, n) and the second takes time independent of m and n. The proofs for the error bounds make important use of matrix perturbation theory and previous work on approximating matrix multiplication and computing low-rank approximations to a matrix. The probability distribution over columns and rows and the rescaling are crucial features of the algorithms and must be chosen judiciously.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available