Projected memory clustering

doi:10.1016/j.patrec.2019.02.023

Pattern Recognition Letters

Volume 123, 15 May 2019, Pages 9-15

https://doi.org/10.1016/j.patrec.2019.02.023 Get rights and content

Highlights

•
We present a new algorithm PMC (Projected Memory Clustering) for projected clustering of high dimensional data.
•
It effectively discovers clusters described by affine subspaces parallel to the main axes of coordinate system.
•
It is fast, which makes it suitable for practical use.

Abstract

We present a new algorithm PMC (Projected Memory Clustering) for projected clustering of high dimensional data. It effectively discovers clusters described by affine subspaces parallel to the main axes of coordinate system. The number of clusters and dimensions of subspaces are selected automatically. Experiments performed on various types of data show that PMC detects clustering structures better than related projected clustering methods. Moreover, it is fast, which makes it suitable for practical use.

Introduction

Clustering, one of fundamental tools of data analysis, aims to partition data into homogeneous groups; in particular, it focuses on extracting non-trivial or hidden patterns from a set of objects. Although various clustering algorithms have been proposed for grouping low-dimensional data [19], [25], [29], [38], it is difficult to use them in high dimensional space. Nevertheless, high dimensional data is of great importance in pattern recognition, natural language processing, computational biology, etc. [11], [26].

Subspace clustering is a class of algorithms, which work well for high dimensional problems and focus on detecting groups described by arbitrary affine subspaces [20], [30], [37]. An arbitrary choice of affine subspaces negatively influences its computational complexity, which partially limits its practical applicability to big data. Projected clustering uses affine subspaces with axis parallel to the elements of the coordinate basis, see Fig. 1, which reduces the computational cost of the algorithm [17], [39]. In other words, projected clustering algorithms define a projected cluster as a pair (X; Y), where X is a subset of data points, and Y is a subset of their attributes, so that the points in X are “close” when projected on the attributes in Y, but they are “not close” when projected on the remaining attributes, see Fig. 1. In consequence, every cluster is described by most informative attributes, which is partially related to co-clustering [8], [24], [33].

In this paper we propose an efficient projected clustering algorithm, PMC (projected memory clustering), which can process high dimensional data with more than 10⁶ attributes. It is an adaptation of a recent state-of-the-art subspace clustering algorithm SuMC [28] to the projected case. The optimization of PMC objective function requires the calculation of coordinate-wise variances (instead of clusters eigenvalues in SuMC), which is linear with respect to both data dimension and number of samples, see Theorem 3.1. Theoretical details of PMC with an optimization algorithm are given in Section 3.

Experiments performed on synthetic and real datasets show that PMC recovers original data structure (measured by Adjusted Rand Index) better than related projected clustering methods, see Section 4. Moreover, it is competitive or even better than state-of-the-art subspace clustering methods in the case of high dimensional data, which is of great practical importance in big data applications, see Fig. 2. To briefly illustrate its effect, we present the results of PMC applied to MNIST data. Fig. 2 shows coordinates used to describe each axis-parallel cluster (we use ten clusters). For a better visualization, we also present the mean values of each cluster. Confusion matrix, presented in Table 1, demonstrates that PMC was able to define correct patterns for most clusters (except digits 4,5,7). One can also discover nonlinear structures by applying PMC on a data set transformed by nonlinear basis functions, such as RBF (radial basis functions) [23], see Fig. 3.

We summarize the main contributions of our paper:

1.
We modify SuMC objective function to define a clustering model to discover parallel-axis affine subspaces. This allows to interpret each cluster by the most informative features analogically to co-clustering.
2.
We propose an extremely efficient algorithm for its optimization, which can easily process data with more than 10⁶ attributes.
3.
The experiments performed on artificial data shows its suitability for detecting axis-parallel clusters and confirm low computational complexity.
4.
Experimental study demonstrates that PMC gives better results than state-of-the-art projected and subspace clustering methods on very high dimensional data, which is crucial in practical use cases and big data applications.

Section snippets

Related works

Subspace clustering received considerable attention in recent years due to the growing number of high dimensional practical problems [20], [27], [30], [37]. Prior work includes iterative methods [3], which alternate between assigning the data points to the identified subspaces and updating the subspaces. Algebraic approaches aim to describe clusters using polynomials whose gradients at a point are orthogonal to the subspace containing that point [31]. Variations of spectral clustering focus on

Projected clustering model

Our method can be understood as a modification of SuMC [28], a recent subspace clustering method. Instead of looking for an arbitrary subspace for each cluster, we restrict our attention to subspaces which are parallel to the main axis of the canonical basis and therefore, we obtain axis-parallel projection clustering method. More precisely, we use an affine subspace defined as: $V_{i} : = mean (X_{i}) + span {(e_{j})}_{j \in {1, \dots, m}},$ where mean(X_i) is the mean of X_i, m ≤ N and (e_j)_j is the canonical base of $R^{N}$ .

Our basic

Experiments

In this section we present the evaluation of our method implemented in C $^{+ +}$ ². All experiments were run on Ubuntu 16.04 (64-bit) workstation with a 3.3 GHz Quad-Core Intel Xeon Processor and 32 GB RAM.

We compare our method with leading state-of-the-art projected clustering approaches: PROCLUS, P3C [17], PreDeCon [2] (implemented in Java, Elki³

Conclusion

In this paper, we have presented a new PMC algorithm for finding axis-parallel clusters. Making use of compression theory, we obtained a method, which automatically detects the optimal dimensions of clusters. Moreover, axis-parallel character of clusters gives the most informative coordinates in clusters. Extensive experiments performed on various types of data showed that PMC detects clustering structures better than related projected clustering methods in reasonable amount of time.

Acknowledgements

The work of P. Spurek was supported by the (National Centre of Science, Poland) Grant No. UMO-2015/19/D/ST6/01472. The work of J. Tabor, Ł. Struski and M. Śmieja was supported by the National Centre of Science (Poland) Grant No. UMO-2017/25/B/ST6/01271.

References (39)

F. Schwenker et al.
Three learning phases for radial-basis-function networks
Neural Netw.
(2001)
L. Struski et al.
Lossy compression approach to subspace clustering
Inf. Sci. (Ny)
(2018)
J. Tabor et al.
Cross-entropy clustering
Pattern Recognit.
(2014)
C.C. Aggarwal et al.
Fast algorithms for projected clustering
ACM SIGMoD Record
(1999)
C. Bohm et al.
Density connected clustering with local subspace preferences
Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on
(2004)
P.S. Bradley et al.
K-plane clustering
J. Global Optim.
(2000)
C. Domeniconi et al.
Locally adaptive metrics for clustering high dimensional data
Data Min. Knowl. Discov.
(2007)
E. Elhamifar et al.
Sparse subspace clustering: algorithm, theory, and applications
IEEE Trans. Pattern Anal. Mach. Intell.
(2013)
M. Ester et al.
A density-based algorithm for discovering clusters in large spatial databases with noise
Kdd
(1996)
P.D. Grünwald
The Minimum Lescription Length Principle
(2007)

F. Gullo et al.

Metacluster-based projective clustering ensembles

Mach. Learn.

(2015)

L. Hubert et al.

Comparing partitions

J. Classif.

(1985)

I. Jolliffe

Principal Component Analysis

(2005)

H.P. Kriegel et al.

Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Trans. Knowl. Discov. Data (TKDD)

(2009)

A. Krizhevsky et al.

Learning multiple layers of features from tiny images

Technical Report

(2009)

Y. LeCun et al.

Gradient-based learning applied to document recognition

Proc. IEEE

(1998)

M. Lichman, UCI machine learning repository, 2013...

C.Y. Lu et al.

Robust and efficient subspace segmentation via least squares regression

European Conference on Computer Vision

(2012)

A.A. Miranda et al.

New routes from minimal approximation error to principal components

Neural Process. Lett.

(2008)

Cited by (0)

^☆: Conflicts of Interest Statement: The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakersbureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

View full text

Projected memory clustering☆

Highlights

Abstract

Introduction

Section snippets

Related works

Projected clustering model

Experiments

Conclusion

Acknowledgements

Neural Netw.

Inf. Sci. (Ny)

Pattern Recognit.

Fast algorithms for projected clustering

ACM SIGMoD Record

Density connected clustering with local subspace preferences

Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on

K-plane clustering

J. Global Optim.

Locally adaptive metrics for clustering high dimensional data

Data Min. Knowl. Discov.

Sparse subspace clustering: algorithm, theory, and applications

IEEE Trans. Pattern Anal. Mach. Intell.

A density-based algorithm for discovering clusters in large spatial databases with noise

Kdd

The Minimum Lescription Length Principle