Skip to main content
Log in

Efficient regularized spectral data embedding

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Data embedding (DE) or dimensionality reduction techniques are particularly well suited to embedding high-dimensional data into a space that in most cases will have just two dimensions. Low-dimensional space, in which data samples (data points) can more easily be visualized, is also often used for learning methods such as clustering. Sometimes, however, DE will identify dimensions that contribute little in terms of the clustering structures that they reveal. In this paper we look at regularized data embedding by clustering, and we propose a simultaneous learning approach for DE and clustering that reinforces the relationships between these two tasks. Our approach is based on a matrix decomposition technique for learning a spectral DE, a cluster membership matrix, and a rotation matrix that closely maps out the continuous spectral embedding, in order to obtain a good clustering solution. We compare our approach with some traditional clustering methods and perform numerical experiments on a collection of benchmark datasets to demonstrate its potential.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The Laplacian provides a natural link between discrete representations (such as graphs) on the one hand, and continuous representations (such as vector spaces and manifolds) on the other.

  2. http://www.uni-marburg.de/fb12/arbeitsgruppen/datenbionik/data.

  3. http://www.rdocumentation.org/packages/clustrd/versions/1.2.0/topics/cluspca.

  4. Available at http://github.com/llabiod/RSDE.

  5. http://www.uni-marburg.de/fb12/arbeitsgruppen/datenbionik/data.

  6. http://yann.lecun.com/exdb/mnist/.

  7. http://archive.ics.uci.edu/ml/datasets/.

  8. http://archive.ics.uci.edu/ml/datasets/letter+recognition.

References

  • Affeldt S, Labiod L, Nadif M (2019) Spectral clustering via ensemble deep autoencoder learning (SC-EDAE). arXiv:1901.02291

  • Ailem M, Role F, Nadif M (2016) Graph modularity maximization as an effective method for co-clustering text data. Knowl Based Syst 109:160–173

    Article  Google Scholar 

  • Bach FR, Jordan MI (2006) Learning spectral clustering, with application to speech separation. J Mach Learn Res 7:1963–2001

    MathSciNet  MATH  Google Scholar 

  • Banijamali E, Ghodsi A (2017) Fast spectral clustering using autoencoders and landmarks. In: International conference image analysis and recognition, Springer, pp 380–388

  • Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. In: Functional genomics, Springer, pp 159–182

  • Bock HH (1987) On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: Multivariate statistical modeling and data analysis, Springer, pp 17–34

  • Boutsidis C, Kambadur P, Gittens A (2015) Spectral clustering via the power method-provably. In: International conference on machine learning, pp 40–48

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793

    Article  Google Scholar 

  • Chan PK, Schlag MD, Zien JY (1994) Spectral k-way ratio-cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 13(9):1088–1096

    Article  Google Scholar 

  • Chang W (1983) On using principal components before separating a mixture of two multivariate normal distributions. Appl Stat 32:267–275

    Article  MathSciNet  Google Scholar 

  • Chen X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Twenty-fifth AAAI conference on artificial intelligence, pp 313–318

  • Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33:568–586

    Article  Google Scholar 

  • De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: New approaches in classification and data analysis, Springer, pp 212–219

  • Dhillon I, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized cuts. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556

  • Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on machine learning, ACM, pp 521–528

  • Ding C, He X, Zha H, Gu M, Simon H (2001) A min max cut algorithm for graph partitioning and data clustering. In: IEEE international conference on data mining (ICDM), pp 107–114

  • Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, SIAM, pp 606–610

  • Ding C, Li T, Jordan M (2008) Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In: IEEE international conference on data mining (ICDM), pp 183–192

  • Engel D, Hüttenberger L, Hamann B (2012) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: OAIS open access series in informatics, Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, vol 27, pp 135–149

  • Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225

    Article  Google Scholar 

  • Gattone S, Rocci R (2012) Clustering curves on a reduced subspace. J Comput Gr Stat 21(2):361–379

    Article  MathSciNet  Google Scholar 

  • Gittins R (1985) Canonical analysis a review with applications in ecology. In: Biomathematics, vol 12, Springer, Berlin

  • Golub G, Loan CV (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. Wiley, New York

    Book  Google Scholar 

  • Govaert G, Nadif M (2018) Mutual information, phi-squared and model-based co-clustering for contingency tables. Adv Data Anal Classif 12(3):455–488

    Article  MathSciNet  Google Scholar 

  • Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  • Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30, pp 24–33

  • Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. In: Advances in neural information processing systems (NIPS), pp 801–808

  • Leyli-Abadi M, Labiod L, Nadif M (2017) Denoising autoencoder as an effective dimensionality reduction and clustering of text data. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 801–813

  • Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 679–686

  • Luo D, Huang H, Ding C, Nie F (2010) On the eigenvectors of p-laplacian. J Mach Learn 81(1):37–51

    Article  MathSciNet  Google Scholar 

  • Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems (NIPS), pp 849–856

  • Nie F, Ding C, Luo D, Huang H (2010) Improved minmax cut graph clustering with nonnegative relaxation. In: European conference on machine learning and practice of knowledge discovery in databases (ECML/PKDD), vol 6322, pp 451–466

  • Role F, Morbieu S, Nadif M (2019) Coclust: a python package for co-clustering. J Stat Softw 88(7):1–29

    Article  Google Scholar 

  • Salah A, Nadif M (2017) Model-based von mises-fisher co-clustering with a conscience. In: Proceedings of the 2017 SIAM international conference on data mining, SIAM, pp 246–254

  • Salah A, Nadif M (2019) Directional co-clustering. Adv Data Anal Classif 13(3):591–620

    Article  MathSciNet  Google Scholar 

  • Schölkopf B, Smola A, Müller KR (1997) Kernel principal component analysis. In: International conference on artificial neural networks. Lausanne, Switzerland, Springer, pp 583–588

  • Schonemann P (1966) A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1):1–10

    Article  MathSciNet  Google Scholar 

  • Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20(4):471–484

    Article  MathSciNet  Google Scholar 

  • Seuret M, Alberti M, Liwicki M, Ingold R (2017) Pca-initialized deep neural networks applied to document image analysis. In: 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, November 9–15, 2017, pp 877–882

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  • Shinnou H, Sasaki M (2008) Spectral clustering for a large data set by reducing the similarity matrix size. In: Proceedings of the sixth international conference on language resources and evaluation (LREC), pp 201–2014

  • Strehl A, Ghosh J (2002) Cluster ensembles:a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  • ten Berge JM (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden

    Google Scholar 

  • Tian K, Zhou S, Guan J (2017) Deepcluster: a general clustering framework based on deep learning. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases

  • Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71

    Google Scholar 

  • Vichi M, Kiers H (2001) Factorial k-means analysis for two-way data. Comput Stat Data Anal 37(1):49–64

    Article  MathSciNet  Google Scholar 

  • Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53(8):3194–3208

    Article  MathSciNet  Google Scholar 

  • Vidal R (2011) Subspace clustering. IEEE Signal Process Mag 28(2):52–68

    Article  Google Scholar 

  • Wang S, Ding Z, Fu Y (2017) Feature selection guided auto-encoder. In: Thirty-first conference on artificial intelligence (AAAI), pp 2725–2731

  • Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487

  • Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6(3):219–247

    Article  MathSciNet  Google Scholar 

  • Yamamoto M, Hwang H (2014) A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika 41(1):115–129

    Article  Google Scholar 

  • Yang L, Cao X, He D, Wang C, Wang X, Zhang W (2016) Modularity based community detection with deep learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI), pp 2252–2258

  • Yang B, Fu X, Sidiropoulos N, Hong M (2017) Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: Proceedings of the 34th international conference on machine learning (ICML), pp 3861–3870

  • Yan D, Huang L, Jordan M (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 907–916

  • Ye J, Zhao Z, Wu M (2008) Discriminative k-means for clustering. In: Advances in neural information processing systems, pp 1649–1656

  • Yuan Z, Yang Z, Oja E (2009) Projective nonnegative matrix factorization: sparseness, orthogonality, and clustering. Neural Process Lett 2009:11–13

    Google Scholar 

  • Zha H, He X, Ding C, Simon H, Gu M (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems (NIPS), MIT Press, pp 1057–1064

  • Zhirong Z, Laaksonen J (2007) Projective nonnegative matrix factorization with applications to facial image processing. J Pattern Recognit Artif Intell 21(8):1353–1362

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lazhar Labiod.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proof of Theorem 1

Appendix A: Proof of Theorem 1

The solution of (5) comes from the singular value decomposition (SVD) of \(\mathbf {X}^\top \mathbf {B}\). Let \(\mathbf {U}\mathbf {D}\mathbf {V}^\top \) be the SVD for \(\mathbf {X}^\top \mathbf {B}\), then \(\mathbf {Q}_{*} = \mathbf {U}\mathbf {V}^\top \).

Proof

We expand the matrix norm

$$\begin{aligned} \left\| \mathbf {X}- \mathbf {B}\mathbf {Q}^\top \right\| ^{2} = Tr(\mathbf {X}^\top \mathbf {X}) - 2 Tr(\mathbf {X}^\top \mathbf {B}\mathbf {Q}^\top )+ Tr(\mathbf {Q}\mathbf {B}^\top \mathbf {B}\mathbf {Q}^\top ). \end{aligned}$$
(16)

Since \(\mathbf {Q}^\top \mathbf {Q}= \mathbf {I}\), the last term is equal to \(Tr(\mathbf {B}^\top \mathbf {B})\) and hence the original minimization problem (4) is equivalent to the maximization of the middle term, i.e (5).

With the SVD of \(\mathbf {X}^\top \mathbf {B}\) = \(\mathbf {U}\mathbf {D}\mathbf {V}^\top \), the middle term becomes

$$\begin{aligned} Tr(\mathbf {X}^\top \mathbf {B}\mathbf {Q}^\top )= & {} Tr(\mathbf {U}\mathbf {D}\mathbf {V}^\top \mathbf {Q}^\top ) \nonumber \\= & {} Tr(\mathbf {U}\mathbf {D}\hat{\mathbf {Q}}^\top ) \quad \text{ where } \quad \hat{\mathbf {Q}}=\mathbf {Q}\mathbf {V}\nonumber \\= & {} Tr(\hat{\mathbf {Q}}^\top \mathbf {U}\mathbf {D}). \end{aligned}$$
(17)

where \(\mathbf {D}=\mathbf {Diag}(d_1,\ldots ,d_k) \in {\mathbb {R}}_{+}^{k \times k}\). In vector form we have \(\mathbf {U}=[\mathbf {u}_1|\ldots |\mathbf {u}_k]\in {\mathbb {R}}^{d \times k}\) and \(\hat{\mathbf {Q}}=[\varvec{\hat{q}}_1|\ldots |\varvec{\hat{q}}_k] \in {\mathbb {R}}^{d \times k}\). Aplying the Cauchy-Shwartz inequality and since \(\mathbf {U}^\top \mathbf {U}=\mathbf {I}\), \(\hat{\mathbf {Q}}^\top \hat{\mathbf {Q}} = \mathbf {I}\) due to \(\mathbf {V}\mathbf {V}^\top =\mathbf {I}\), we get

$$\begin{aligned} Tr(\hat{\mathbf {Q}}^\top \mathbf {U}\mathbf {D}) \le \sum _{i}d_i ||\mathbf {u}_i||\times ||\varvec{\hat{q}}_i||=\sum _i d_i= Tr(\mathbf {D}). \end{aligned}$$

Then the upper bound is clearly attained by setting \(\hat{\mathbf {Q}}= \mathbf {U}\). This leads to \(\hat{\mathbf {Q}}=\mathbf {Q}\mathbf {V}= \mathbf {U}\) and \(\mathbf {Q}\mathbf {V}\mathbf {V}^\top = \mathbf {U}\mathbf {V}^\top \). Hence we get \(\mathbf {Q}_* = \mathbf {U}\mathbf {V}^\top .\)

\(\square \)

Remark 1

Note that the problem in (5) can be considered as a special case of the Orthogonal Procrustes Problem (OPP) (Schonemann, 1966) in which \(\mathbf {Q}\) is a square orthogonal rotation matrix (i.e \(\mathbf {Q}^\top \mathbf {Q}=\mathbf {Q}\mathbf {Q}^\top = \mathbf {I}\)). Hence, the difference between the problem in Theorem 1 and OPP is the constraint.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Labiod, L., Nadif, M. Efficient regularized spectral data embedding. Adv Data Anal Classif 15, 99–119 (2021). https://doi.org/10.1007/s11634-020-00386-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-020-00386-8

Keywords

Mathematics Subject Classification

Navigation