Abstract
Much recent work in the theoretical computer science, linear algebra, and machine learning has considered matrix decompositions of the following form: given an m ×n matrix A, decompose it as a product of three matrices, C, U, and R, where C consists of a small number of columns of A, R consists of a small number of rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is “close” to A. Applications of such decompositions include the computation of matrix “sketches”, speeding up kernel-based statistical learning, preserving sparsity in low-rank matrix representation, and improved interpretability of data analysis methods. Our main result is a randomized, polynomial algorithm which, given as input an m ×n matrix A, returns as output matrices C, U, R such that
with probability at least 1–δ. Here, A k is the “best” rank-k approximation (provided by truncating the Singular Value Decomposition of A), and ||X|| F is the Frobenius norm of the matrix X. The number of columns in C and rows in R is a low-degree polynomial in k, 1/ε, and log(1/δ). Our main result is obtained by an extension of our recent relative error approximation algorithm for ℓ2 regression from overconstrained problems to general ℓ2 regression problems. Our algorithm is simple, and it takes time of the order of the time needed to compute the top k right singular vectors of A. In addition, it samples the columns and rows of A via the method of “subspace sampling,” so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors, and since they ensure that we capture entirely a certain subspace of interest.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berry, M.W., Pulatova, S.A., Stewart, G.W.: Computing sparse reduced-rank approximations to sparse matrices. Technical Report UMIACS TR-2004-32 CMSC TR-4589, University of Maryland, College Park, MD (2004)
Bhatia, R.: Matrix Analysis. Springer, New York (1997)
Deshpande, A., Rademacher, L., Vempala, S., Wang, G.: Matrix approximation and projective clustering via volume sampling. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1117–1126 (2006)
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication. SIAM Journal on Computing (to appear)
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing (to appear)
Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing (to appear)
Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research 6, 2153–2175 (2005)
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Polynomial time algorithm for column-row based relative-error low-rank matrix approximation. Technical Report 2006-04, DIMACS (March 2006)
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Sampling algorithms for ℓ2. regression and applications. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1127–1136 (2006)
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1989)
Goreinov, S.A., Tyrtyshnikov, E.E.: The maximum-volume concept in approximation by low-rank matrices. Contemporary Mathematics 280, 47–51 (2001)
Goreinov, S.A., Tyrtyshnikov, E.E., Zamarashkin, N.L.: A theory of pseudoskeleton approximations. Linear Algebra and Its Applications 261, 1–21 (1997)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, New York (1985)
Kuruvilla, F.G., Park, P.J., Schreiber, S.L.: Vector algebra in the analysis of genome-wide expression data. Genome Biology 3, 0011.1–0011.11 (2002)
Lin, Z., Altman, R.B.: Finding haplotype tagging SNPs by use of principal components analysis. American Journal of Human Genetics 75, 850–861 (2004)
Nashed, M.Z. (ed.): Generalized Inverses and Applications. Academic Press, New York (1976)
Paschou, P., Mahoney, M.W., Kidd, J.R., Pakstis, A.J., Gu, S., Kidd, K.K., Drineas, P.: Intra- and inter-population genotype reconstruction from tagging SNPs (manuscript submitted for publication)
Rademacher, L., Vempala, S., Wang, G.: Matrix approximation and projective clustering via iterative sampling. Technical Report MIT-LCS-TR-983, Massachusetts Institute of Technology, Cambridge, MA (March 2005)
Stewart, G.W.: Four algorithms for the efficient computation of truncated QR approximations to a sparse matrix. Numerische Mathematik 83, 313–323 (1999)
Stewart, G.W.: Error analysis of the quasi-Gram-Schmidt algorithm. Technical Report UMIACS TR-2004-17 CMSC TR-4572, University of Maryland, College Park, MD (2004)
Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Annual Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, pp. 682–688 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Drineas, P., Mahoney, M.W., Muthukrishnan, S. (2006). Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods. In: Azar, Y., Erlebach, T. (eds) Algorithms – ESA 2006. ESA 2006. Lecture Notes in Computer Science, vol 4168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11841036_29
Download citation
DOI: https://doi.org/10.1007/11841036_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38875-3
Online ISBN: 978-3-540-38876-0
eBook Packages: Computer ScienceComputer Science (R0)