Efficient methods for grouping vectors into low-rank clusters

https://doi.org/10.1016/j.jcp.2011.03.048Get rights and content

Abstract

We present a few practical algorithms for sorting vectors into low-rank clusters. These algorithms rely on a subdivision scheme applied to the space of projections from d-dimensions to 1-dimension. This subdivision scheme can be thought of as a higher-dimensional generalization of quicksort. Given the ability to quickly sort vectors into low-rank clusters, one can efficiently search a matrix for low-rank sub-blocks of large diameter. The ability to detect large-diameter low-rank sub-blocks has many applications, ranging from data-analysis to matrix-compression.

Introduction

Many techniques for matrix-compression take advantage of the reduction of dimensionality which can be attained if the matrix in question has low rank. For example, principal-component-analysis (i.e., singular-value-decomposition [13]) allows for the efficient representation of a rank-k M × M matrix with only k outer-products, requiring only ∼2kM degrees of freedom.

More recent techniques for matrix-compression can be applied when the matrices in question have structurally evident low-rank sub-blocks. For example, the matrix-skeletonization techniques of [8], [14], [10], allow for accelerated matrix–vector multiplication and matrix-inversion (via interpolative-decomposition [7], [10]) when the matrix in question can be divided into sub-blocks such that (A) each off-diagonal sub-block has low-rank, and (B) each diagonal sub-block can be divided into sub-blocks in a fashion similar to the original matrix. Note that, in the case of matrix-skeletonization, the low-rank sub-blocks of the original matrix must be structurally evident from the outset. Each low-rank sub-block must be composed of a contiguous set of row-indices, and a contiguous set of column-indices. If the original matrix has low-rank sub-blocks which do not have this evident structure, the techniques of [8], [14], [10] cannot be directly applied to compress this matrix.

It is natural to ask if similar principles can be applied when a matrix has low-rank sub-blocks which are not structurally obvious. For example, if a matrix has low-rank sub-blocks which correspond to index-subsets which are non-contiguous, is it possible to (A) detect the low-rank sub-block structure within the original matrix, and (B) take advantage of this structure to effectively compress the original matrix? In this paper we present techniques which can be used to address these questions. In Section 2 we present a few straightforward algorithms which can be used to detect low-rank clusters of vectors within a list of vectors. These algorithms can be thought of as a generalization of the well-known quicksort algorithm [5], and are practical when the cluster rank k is low (e.g., k = 1, 2, 3, 4). In Section 3.1 we make use of this cluster-finding algorithm to detect large-diameter low-rank sub-blocks of a given matrix, and present an example in which this technique is applied to data-analysis. In Section 3.2 we present matrix-compression techniques which can be used to take advantage of the low-rank block structure associated with hierarchically factorizable matrices, and present an example illustrating the effectiveness of these techniques for a specific class of random matrices.

Section snippets

Grouping vectors into low-dimensional subspaces

Generally speaking, there are two basic problems we will consider. The first problem involves finding every low-rank cluster within a collection of vectors, and the second problem involves finding the largest low-rank cluster within a collection of vectors. These problems (stated formally below) are encountered naturally in many contexts (e.g., Problem 1 is associated with classical ‘clustering’, and Problem 2 is associated with the ‘de-noising’ of principle-component-analysis [6]).

Problem 1

Assume that

Applications

The algorithms detailed above have a complexity of O(MklogM) when solving Problem 1, and of O(MlogM) when solving Problem 2, and are quite practical for low k  3 (i.e.,when d = k + 1 is ⩽4). This allows for many applications which take advantage of the reduction in dimensionality that can be achieved if vectors can be grouped into low-rank clusters.

References (14)

  • E. Liberty et al.

    The mailman algorithm: a note on matrix vector multiplication

    Information Processing Letters

    (2009)
  • P.G. Martinsson et al.

    A randomized algorithm for the approximation of matrices

    Applied and Computational Harmonic Analysis

    (2011)
  • A. Asuncion, D.J. Newman, UCI machine learning repository,...
  • E. Candes et al.

    Fast computation of fourier integral operators

    SIAM Journal on Scientific Computing

    (2007)
  • H. Cheng et al.

    On the compression of low rank matrices

    SIAM Journal on Scientific Computing

    (2005)
  • J.W. Cooley et al.

    An algorithm for the machine calculation of complex fourier series

    Mathematics of Computation

    (1965)
  • T.H. Cormen et al.

    Introduction to Algorithms

    (1990)
There are more references available in the full text version of this article.

Cited by (2)

  • A simple filter for detecting low-rank submatrices

    2012, Journal of Computational Physics
    Citation Excerpt :

    Many techniques for data-analysis and matrix-compression take advantage of the reduction of dimensionality which can be attained if a submatrix within a larger matrix has low numerical-rank [1–10].

  • Detecting low-rank clusters via random sampling

    2012, Journal of Computational Physics
    Citation Excerpt :

    Many techniques for data-analysis and matrix-compression take advantage of the reduction of dimensionality which can be attained if a cluster of vectors within a larger matrix has low numerical rank [1–9].

Supported by NSF grant DMS-0914827.

View full text