Efficient regularized spectral data embedding

Labiod, Lazhar; Nadif, Mohamed

doi:10.1007/s11634-020-00386-8

Efficient regularized spectral data embedding

Regular Article
Published: 24 February 2020

Volume 15, pages 99–119, (2021)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

333 Accesses
3 Citations
Explore all metrics

Abstract

Data embedding (DE) or dimensionality reduction techniques are particularly well suited to embedding high-dimensional data into a space that in most cases will have just two dimensions. Low-dimensional space, in which data samples (data points) can more easily be visualized, is also often used for learning methods such as clustering. Sometimes, however, DE will identify dimensions that contribute little in terms of the clustering structures that they reveal. In this paper we look at regularized data embedding by clustering, and we propose a simultaneous learning approach for DE and clustering that reinforces the relationships between these two tasks. Our approach is based on a matrix decomposition technique for learning a spectral DE, a cluster membership matrix, and a rotation matrix that closely maps out the continuous spectral embedding, in order to obtain a good clustering solution. We compare our approach with some traditional clustering methods and perform numerical experiments on a collection of benchmark datasets to demonstrate its potential.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Weikuan Jia, Meili Sun, … Sujuan Hou

Notes

The Laplacian provides a natural link between discrete representations (such as graphs) on the one hand, and continuous representations (such as vector spaces and manifolds) on the other.
http://www.uni-marburg.de/fb12/arbeitsgruppen/datenbionik/data.
http://www.rdocumentation.org/packages/clustrd/versions/1.2.0/topics/cluspca.
Available at http://github.com/llabiod/RSDE.
http://www.uni-marburg.de/fb12/arbeitsgruppen/datenbionik/data.
http://yann.lecun.com/exdb/mnist/.
http://archive.ics.uci.edu/ml/datasets/.
http://archive.ics.uci.edu/ml/datasets/letter+recognition.

References

Affeldt S, Labiod L, Nadif M (2019) Spectral clustering via ensemble deep autoencoder learning (SC-EDAE). arXiv:1901.02291
Ailem M, Role F, Nadif M (2016) Graph modularity maximization as an effective method for co-clustering text data. Knowl Based Syst 109:160–173
Article Google Scholar
Bach FR, Jordan MI (2006) Learning spectral clustering, with application to speech separation. J Mach Learn Res 7:1963–2001
MathSciNet MATH Google Scholar
Banijamali E, Ghodsi A (2017) Fast spectral clustering using autoencoders and landmarks. In: International conference image analysis and recognition, Springer, pp 380–388
Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. In: Functional genomics, Springer, pp 159–182
Bock HH (1987) On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: Multivariate statistical modeling and data analysis, Springer, pp 17–34
Boutsidis C, Kambadur P, Gittens A (2015) Spectral clustering via the power method-provably. In: International conference on machine learning, pp 40–48
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793
Article Google Scholar
Chan PK, Schlag MD, Zien JY (1994) Spectral k-way ratio-cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 13(9):1088–1096
Article Google Scholar
Chang W (1983) On using principal components before separating a mixture of two multivariate normal distributions. Appl Stat 32:267–275
Article MathSciNet Google Scholar
Chen X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Twenty-fifth AAAI conference on artificial intelligence, pp 313–318
Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33:568–586
Article Google Scholar
De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: New approaches in classification and data analysis, Springer, pp 212–219
Dhillon I, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized cuts. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556
Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on machine learning, ACM, pp 521–528
Ding C, He X, Zha H, Gu M, Simon H (2001) A min max cut algorithm for graph partitioning and data clustering. In: IEEE international conference on data mining (ICDM), pp 107–114
Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, SIAM, pp 606–610
Ding C, Li T, Jordan M (2008) Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In: IEEE international conference on data mining (ICDM), pp 183–192
Engel D, Hüttenberger L, Hamann B (2012) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: OAIS open access series in informatics, Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, vol 27, pp 135–149
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
Article Google Scholar
Gattone S, Rocci R (2012) Clustering curves on a reduced subspace. J Comput Gr Stat 21(2):361–379
Article MathSciNet Google Scholar
Gittins R (1985) Canonical analysis a review with applications in ecology. In: Biomathematics, vol 12, Springer, Berlin
Golub G, Loan CV (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. Wiley, New York
Book Google Scholar
Govaert G, Nadif M (2018) Mutual information, phi-squared and model-based co-clustering for contingency tables. Adv Data Anal Classif 12(3):455–488
Article MathSciNet Google Scholar
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30, pp 24–33
Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. In: Advances in neural information processing systems (NIPS), pp 801–808
Leyli-Abadi M, Labiod L, Nadif M (2017) Denoising autoencoder as an effective dimensionality reduction and clustering of text data. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 801–813
Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 679–686
Luo D, Huang H, Ding C, Nie F (2010) On the eigenvectors of p-laplacian. J Mach Learn 81(1):37–51
Article MathSciNet Google Scholar
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems (NIPS), pp 849–856
Nie F, Ding C, Luo D, Huang H (2010) Improved minmax cut graph clustering with nonnegative relaxation. In: European conference on machine learning and practice of knowledge discovery in databases (ECML/PKDD), vol 6322, pp 451–466
Role F, Morbieu S, Nadif M (2019) Coclust: a python package for co-clustering. J Stat Softw 88(7):1–29
Article Google Scholar
Salah A, Nadif M (2017) Model-based von mises-fisher co-clustering with a conscience. In: Proceedings of the 2017 SIAM international conference on data mining, SIAM, pp 246–254
Salah A, Nadif M (2019) Directional co-clustering. Adv Data Anal Classif 13(3):591–620
Article MathSciNet Google Scholar
Schölkopf B, Smola A, Müller KR (1997) Kernel principal component analysis. In: International conference on artificial neural networks. Lausanne, Switzerland, Springer, pp 583–588
Schonemann P (1966) A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1):1–10
Article MathSciNet Google Scholar
Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20(4):471–484
Article MathSciNet Google Scholar
Seuret M, Alberti M, Liwicki M, Ingold R (2017) Pca-initialized deep neural networks applied to document image analysis. In: 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, November 9–15, 2017, pp 877–882
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Shinnou H, Sasaki M (2008) Spectral clustering for a large data set by reducing the similarity matrix size. In: Proceedings of the sixth international conference on language resources and evaluation (LREC), pp 201–2014
Strehl A, Ghosh J (2002) Cluster ensembles:a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
ten Berge JM (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden
Google Scholar
Tian K, Zhou S, Guan J (2017) Deepcluster: a general clustering framework based on deep learning. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71
Google Scholar
Vichi M, Kiers H (2001) Factorial k-means analysis for two-way data. Comput Stat Data Anal 37(1):49–64
Article MathSciNet Google Scholar
Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53(8):3194–3208
Article MathSciNet Google Scholar
Vidal R (2011) Subspace clustering. IEEE Signal Process Mag 28(2):52–68
Article Google Scholar
Wang S, Ding Z, Fu Y (2017) Feature selection guided auto-encoder. In: Thirty-first conference on artificial intelligence (AAAI), pp 2725–2731
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6(3):219–247
Article MathSciNet Google Scholar
Yamamoto M, Hwang H (2014) A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika 41(1):115–129
Article Google Scholar
Yang L, Cao X, He D, Wang C, Wang X, Zhang W (2016) Modularity based community detection with deep learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI), pp 2252–2258
Yang B, Fu X, Sidiropoulos N, Hong M (2017) Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: Proceedings of the 34th international conference on machine learning (ICML), pp 3861–3870
Yan D, Huang L, Jordan M (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 907–916
Ye J, Zhao Z, Wu M (2008) Discriminative k-means for clustering. In: Advances in neural information processing systems, pp 1649–1656
Yuan Z, Yang Z, Oja E (2009) Projective nonnegative matrix factorization: sparseness, orthogonality, and clustering. Neural Process Lett 2009:11–13
Google Scholar
Zha H, He X, Ding C, Simon H, Gu M (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems (NIPS), MIT Press, pp 1057–1064
Zhirong Z, Laaksonen J (2007) Projective nonnegative matrix factorization with applications to facial image processing. J Pattern Recognit Artif Intell 21(8):1353–1362
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIPADE, Université de Paris, 45 rue des Saints Pères, 75006, Paris, France
Lazhar Labiod & Mohamed Nadif

Authors

Lazhar Labiod
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lazhar Labiod.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Theorem 1

The solution of (5) comes from the singular value decomposition (SVD) of $\mathbf {X}^\top \mathbf {B}$. Let $\mathbf {U}\mathbf {D}\mathbf {V}^\top $ be the SVD for $\mathbf {X}^\top \mathbf {B}$, then $\mathbf {Q}_{*} = \mathbf {U}\mathbf {V}^\top $.

Proof

We expand the matrix norm

$$\begin{aligned} \left\| \mathbf {X}- \mathbf {B}\mathbf {Q}^\top \right\| ^{2} = Tr(\mathbf {X}^\top \mathbf {X}) - 2 Tr(\mathbf {X}^\top \mathbf {B}\mathbf {Q}^\top )+ Tr(\mathbf {Q}\mathbf {B}^\top \mathbf {B}\mathbf {Q}^\top ). \end{aligned}$$

(16)

Since $\mathbf {Q}^\top \mathbf {Q}= \mathbf {I}$, the last term is equal to $Tr(\mathbf {B}^\top \mathbf {B})$ and hence the original minimization problem (4) is equivalent to the maximization of the middle term, i.e (5).

With the SVD of $\mathbf {X}^\top \mathbf {B}$ = $\mathbf {U}\mathbf {D}\mathbf {V}^\top $, the middle term becomes

$$\begin{aligned} Tr(\mathbf {X}^\top \mathbf {B}\mathbf {Q}^\top )= & {} Tr(\mathbf {U}\mathbf {D}\mathbf {V}^\top \mathbf {Q}^\top ) \nonumber \\= & {} Tr(\mathbf {U}\mathbf {D}\hat{\mathbf {Q}}^\top ) \quad \text{ where } \quad \hat{\mathbf {Q}}=\mathbf {Q}\mathbf {V}\nonumber \\= & {} Tr(\hat{\mathbf {Q}}^\top \mathbf {U}\mathbf {D}). \end{aligned}$$

(17)

where $\mathbf {D}=\mathbf {Diag}(d_1,\ldots ,d_k) \in {\mathbb {R}}_{+}^{k \times k}$. In vector form we have $\mathbf {U}=[\mathbf {u}_1|\ldots |\mathbf {u}_k]\in {\mathbb {R}}^{d \times k}$ and $\hat{\mathbf {Q}}=[\varvec{\hat{q}}_1|\ldots |\varvec{\hat{q}}_k] \in {\mathbb {R}}^{d \times k}$. Aplying the Cauchy-Shwartz inequality and since $\mathbf {U}^\top \mathbf {U}=\mathbf {I}$, $\hat{\mathbf {Q}}^\top \hat{\mathbf {Q}} = \mathbf {I}$ due to $\mathbf {V}\mathbf {V}^\top =\mathbf {I}$, we get

$$\begin{aligned} Tr(\hat{\mathbf {Q}}^\top \mathbf {U}\mathbf {D}) \le \sum _{i}d_i ||\mathbf {u}_i||\times ||\varvec{\hat{q}}_i||=\sum _i d_i= Tr(\mathbf {D}). \end{aligned}$$

Then the upper bound is clearly attained by setting $\hat{\mathbf {Q}}= \mathbf {U}$. This leads to $\hat{\mathbf {Q}}=\mathbf {Q}\mathbf {V}= \mathbf {U}$ and $\mathbf {Q}\mathbf {V}\mathbf {V}^\top = \mathbf {U}\mathbf {V}^\top $. Hence we get $\mathbf {Q}_* = \mathbf {U}\mathbf {V}^\top .$

$\square $

Remark 1

Note that the problem in (5) can be considered as a special case of the Orthogonal Procrustes Problem (OPP) (Schonemann, 1966) in which $\mathbf {Q}$ is a square orthogonal rotation matrix (i.e $\mathbf {Q}^\top \mathbf {Q}=\mathbf {Q}\mathbf {Q}^\top = \mathbf {I}$). Hence, the difference between the problem in Theorem 1 and OPP is the constraint.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Labiod, L., Nadif, M. Efficient regularized spectral data embedding. Adv Data Anal Classif 15, 99–119 (2021). https://doi.org/10.1007/s11634-020-00386-8

Download citation

Received: 22 September 2018
Revised: 29 December 2019
Accepted: 14 February 2020
Published: 24 February 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11634-020-00386-8

Keywords

Mathematics Subject Classification

62H30 Classification and discrimination; cluster analysis

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Efficient regularized spectral data embedding

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Feature dimensionality reduction: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Proof of Theorem 1

Proof

Remark 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Efficient regularized spectral data embedding

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Feature dimensionality reduction: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Proof of Theorem 1

Appendix A: Proof of Theorem 1

Proof

Remark 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation