Efficient methods for grouping vectors into low-rank clusters☆
Introduction
Many techniques for matrix-compression take advantage of the reduction of dimensionality which can be attained if the matrix in question has low rank. For example, principal-component-analysis (i.e., singular-value-decomposition [13]) allows for the efficient representation of a rank-k M × M matrix with only k outer-products, requiring only ∼2kM degrees of freedom.
More recent techniques for matrix-compression can be applied when the matrices in question have structurally evident low-rank sub-blocks. For example, the matrix-skeletonization techniques of [8], [14], [10], allow for accelerated matrix–vector multiplication and matrix-inversion (via interpolative-decomposition [7], [10]) when the matrix in question can be divided into sub-blocks such that (A) each off-diagonal sub-block has low-rank, and (B) each diagonal sub-block can be divided into sub-blocks in a fashion similar to the original matrix. Note that, in the case of matrix-skeletonization, the low-rank sub-blocks of the original matrix must be structurally evident from the outset. Each low-rank sub-block must be composed of a contiguous set of row-indices, and a contiguous set of column-indices. If the original matrix has low-rank sub-blocks which do not have this evident structure, the techniques of [8], [14], [10] cannot be directly applied to compress this matrix.
It is natural to ask if similar principles can be applied when a matrix has low-rank sub-blocks which are not structurally obvious. For example, if a matrix has low-rank sub-blocks which correspond to index-subsets which are non-contiguous, is it possible to (A) detect the low-rank sub-block structure within the original matrix, and (B) take advantage of this structure to effectively compress the original matrix? In this paper we present techniques which can be used to address these questions. In Section 2 we present a few straightforward algorithms which can be used to detect low-rank clusters of vectors within a list of vectors. These algorithms can be thought of as a generalization of the well-known quicksort algorithm [5], and are practical when the cluster rank k is low (e.g., k = 1, 2, 3, 4). In Section 3.1 we make use of this cluster-finding algorithm to detect large-diameter low-rank sub-blocks of a given matrix, and present an example in which this technique is applied to data-analysis. In Section 3.2 we present matrix-compression techniques which can be used to take advantage of the low-rank block structure associated with hierarchically factorizable matrices, and present an example illustrating the effectiveness of these techniques for a specific class of random matrices.
Section snippets
Grouping vectors into low-dimensional subspaces
Generally speaking, there are two basic problems we will consider. The first problem involves finding every low-rank cluster within a collection of vectors, and the second problem involves finding the largest low-rank cluster within a collection of vectors. These problems (stated formally below) are encountered naturally in many contexts (e.g., Problem 1 is associated with classical ‘clustering’, and Problem 2 is associated with the ‘de-noising’ of principle-component-analysis [6]). Problem 1 Assume that
Applications
The algorithms detailed above have a complexity of O(MklogM) when solving Problem 1, and of O(MlogM) when solving Problem 2, and are quite practical for low k ⩽ 3 (i.e.,when d = k + 1 is ⩽4). This allows for many applications which take advantage of the reduction in dimensionality that can be achieved if vectors can be grouped into low-rank clusters.
References (14)
- et al.
The mailman algorithm: a note on matrix vector multiplication
Information Processing Letters
(2009) - et al.
A randomized algorithm for the approximation of matrices
Applied and Computational Harmonic Analysis
(2011) - A. Asuncion, D.J. Newman, UCI machine learning repository,...
- et al.
Fast computation of fourier integral operators
SIAM Journal on Scientific Computing
(2007) - et al.
On the compression of low rank matrices
SIAM Journal on Scientific Computing
(2005) - et al.
An algorithm for the machine calculation of complex fourier series
Mathematics of Computation
(1965) - et al.
Introduction to Algorithms
(1990)
Cited by (2)
A simple filter for detecting low-rank submatrices
2012, Journal of Computational PhysicsCitation Excerpt :Many techniques for data-analysis and matrix-compression take advantage of the reduction of dimensionality which can be attained if a submatrix within a larger matrix has low numerical-rank [1–10].
Detecting low-rank clusters via random sampling
2012, Journal of Computational PhysicsCitation Excerpt :Many techniques for data-analysis and matrix-compression take advantage of the reduction of dimensionality which can be attained if a cluster of vectors within a larger matrix has low numerical rank [1–9].
- ☆
Supported by NSF grant DMS-0914827.