Abstract
Data embedding (DE) or dimensionality reduction techniques are particularly well suited to embedding high-dimensional data into a space that in most cases will have just two dimensions. Low-dimensional space, in which data samples (data points) can more easily be visualized, is also often used for learning methods such as clustering. Sometimes, however, DE will identify dimensions that contribute little in terms of the clustering structures that they reveal. In this paper we look at regularized data embedding by clustering, and we propose a simultaneous learning approach for DE and clustering that reinforces the relationships between these two tasks. Our approach is based on a matrix decomposition technique for learning a spectral DE, a cluster membership matrix, and a rotation matrix that closely maps out the continuous spectral embedding, in order to obtain a good clustering solution. We compare our approach with some traditional clustering methods and perform numerical experiments on a collection of benchmark datasets to demonstrate its potential.
Similar content being viewed by others
Notes
The Laplacian provides a natural link between discrete representations (such as graphs) on the one hand, and continuous representations (such as vector spaces and manifolds) on the other.
Available at http://github.com/llabiod/RSDE.
References
Affeldt S, Labiod L, Nadif M (2019) Spectral clustering via ensemble deep autoencoder learning (SC-EDAE). arXiv:1901.02291
Ailem M, Role F, Nadif M (2016) Graph modularity maximization as an effective method for co-clustering text data. Knowl Based Syst 109:160–173
Bach FR, Jordan MI (2006) Learning spectral clustering, with application to speech separation. J Mach Learn Res 7:1963–2001
Banijamali E, Ghodsi A (2017) Fast spectral clustering using autoencoders and landmarks. In: International conference image analysis and recognition, Springer, pp 380–388
Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. In: Functional genomics, Springer, pp 159–182
Bock HH (1987) On the interface between cluster analysis, principal component analysis, and multidimensional scaling. In: Multivariate statistical modeling and data analysis, Springer, pp 17–34
Boutsidis C, Kambadur P, Gittens A (2015) Spectral clustering via the power method-provably. In: International conference on machine learning, pp 40–48
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793
Chan PK, Schlag MD, Zien JY (1994) Spectral k-way ratio-cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst 13(9):1088–1096
Chang W (1983) On using principal components before separating a mixture of two multivariate normal distributions. Appl Stat 32:267–275
Chen X, Cai D (2011) Large scale spectral clustering with landmark-based representation. In: Twenty-fifth AAAI conference on artificial intelligence, pp 313–318
Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33:568–586
De Soete G, Carroll JD (1994) K-means clustering in a low-dimensional Euclidean space. In: New approaches in classification and data analysis, Springer, pp 212–219
Dhillon I, Guan Y, Kulis B (2004) Kernel k-means, spectral clustering and normalized cuts. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556
Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and k-means clustering. In: Proceedings of the 24th international conference on machine learning, ACM, pp 521–528
Ding C, He X, Zha H, Gu M, Simon H (2001) A min max cut algorithm for graph partitioning and data clustering. In: IEEE international conference on data mining (ICDM), pp 107–114
Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, SIAM, pp 606–610
Ding C, Li T, Jordan M (2008) Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In: IEEE international conference on data mining (ICDM), pp 183–192
Engel D, Hüttenberger L, Hamann B (2012) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: OAIS open access series in informatics, Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, vol 27, pp 135–149
Fowlkes C, Belongie S, Chung F, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225
Gattone S, Rocci R (2012) Clustering curves on a reduced subspace. J Comput Gr Stat 21(2):361–379
Gittins R (1985) Canonical analysis a review with applications in ecology. In: Biomathematics, vol 12, Springer, Berlin
Golub G, Loan CV (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
Govaert G, Nadif M (2013) Co-clustering: models, algorithms and applications. Wiley, New York
Govaert G, Nadif M (2018) Mutual information, phi-squared and model-based co-clustering for contingency tables. Adv Data Anal Classif 12(3):455–488
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30, pp 24–33
Lee H, Battle A, Raina R, Ng A (2007) Efficient sparse coding algorithms. In: Advances in neural information processing systems (NIPS), pp 801–808
Leyli-Abadi M, Labiod L, Nadif M (2017) Denoising autoencoder as an effective dimensionality reduction and clustering of text data. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 801–813
Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 679–686
Luo D, Huang H, Ding C, Nie F (2010) On the eigenvectors of p-laplacian. J Mach Learn 81(1):37–51
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems (NIPS), pp 849–856
Nie F, Ding C, Luo D, Huang H (2010) Improved minmax cut graph clustering with nonnegative relaxation. In: European conference on machine learning and practice of knowledge discovery in databases (ECML/PKDD), vol 6322, pp 451–466
Role F, Morbieu S, Nadif M (2019) Coclust: a python package for co-clustering. J Stat Softw 88(7):1–29
Salah A, Nadif M (2017) Model-based von mises-fisher co-clustering with a conscience. In: Proceedings of the 2017 SIAM international conference on data mining, SIAM, pp 246–254
Salah A, Nadif M (2019) Directional co-clustering. Adv Data Anal Classif 13(3):591–620
Schölkopf B, Smola A, Müller KR (1997) Kernel principal component analysis. In: International conference on artificial neural networks. Lausanne, Switzerland, Springer, pp 583–588
Schonemann P (1966) A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1):1–10
Scrucca L (2010) Dimension reduction for model-based clustering. Stat Comput 20(4):471–484
Seuret M, Alberti M, Liwicki M, Ingold R (2017) Pca-initialized deep neural networks applied to document image analysis. In: 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, November 9–15, 2017, pp 877–882
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Shinnou H, Sasaki M (2008) Spectral clustering for a large data set by reducing the similarity matrix size. In: Proceedings of the sixth international conference on language resources and evaluation (LREC), pp 201–2014
Strehl A, Ghosh J (2002) Cluster ensembles:a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
ten Berge JM (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden
Tian K, Zhou S, Guan J (2017) Deepcluster: a general clustering framework based on deep learning. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71
Vichi M, Kiers H (2001) Factorial k-means analysis for two-way data. Comput Stat Data Anal 37(1):49–64
Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53(8):3194–3208
Vidal R (2011) Subspace clustering. IEEE Signal Process Mag 28(2):52–68
Wang S, Ding Z, Fu Y (2017) Feature selection guided auto-encoder. In: Thirty-first conference on artificial intelligence (AAAI), pp 2725–2731
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Yamamoto M (2012) Clustering of functional data in a low-dimensional subspace. Adv Data Anal Classif 6(3):219–247
Yamamoto M, Hwang H (2014) A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika 41(1):115–129
Yang L, Cao X, He D, Wang C, Wang X, Zhang W (2016) Modularity based community detection with deep learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI), pp 2252–2258
Yang B, Fu X, Sidiropoulos N, Hong M (2017) Towards k-means-friendly spaces: simultaneous deep learning and clustering. In: Proceedings of the 34th international conference on machine learning (ICML), pp 3861–3870
Yan D, Huang L, Jordan M (2009) Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 907–916
Ye J, Zhao Z, Wu M (2008) Discriminative k-means for clustering. In: Advances in neural information processing systems, pp 1649–1656
Yuan Z, Yang Z, Oja E (2009) Projective nonnegative matrix factorization: sparseness, orthogonality, and clustering. Neural Process Lett 2009:11–13
Zha H, He X, Ding C, Simon H, Gu M (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems (NIPS), MIT Press, pp 1057–1064
Zhirong Z, Laaksonen J (2007) Projective nonnegative matrix factorization with applications to facial image processing. J Pattern Recognit Artif Intell 21(8):1353–1362
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proof of Theorem 1
Appendix A: Proof of Theorem 1
The solution of (5) comes from the singular value decomposition (SVD) of \(\mathbf {X}^\top \mathbf {B}\). Let \(\mathbf {U}\mathbf {D}\mathbf {V}^\top \) be the SVD for \(\mathbf {X}^\top \mathbf {B}\), then \(\mathbf {Q}_{*} = \mathbf {U}\mathbf {V}^\top \).
Proof
We expand the matrix norm
Since \(\mathbf {Q}^\top \mathbf {Q}= \mathbf {I}\), the last term is equal to \(Tr(\mathbf {B}^\top \mathbf {B})\) and hence the original minimization problem (4) is equivalent to the maximization of the middle term, i.e (5).
With the SVD of \(\mathbf {X}^\top \mathbf {B}\) = \(\mathbf {U}\mathbf {D}\mathbf {V}^\top \), the middle term becomes
where \(\mathbf {D}=\mathbf {Diag}(d_1,\ldots ,d_k) \in {\mathbb {R}}_{+}^{k \times k}\). In vector form we have \(\mathbf {U}=[\mathbf {u}_1|\ldots |\mathbf {u}_k]\in {\mathbb {R}}^{d \times k}\) and \(\hat{\mathbf {Q}}=[\varvec{\hat{q}}_1|\ldots |\varvec{\hat{q}}_k] \in {\mathbb {R}}^{d \times k}\). Aplying the Cauchy-Shwartz inequality and since \(\mathbf {U}^\top \mathbf {U}=\mathbf {I}\), \(\hat{\mathbf {Q}}^\top \hat{\mathbf {Q}} = \mathbf {I}\) due to \(\mathbf {V}\mathbf {V}^\top =\mathbf {I}\), we get
Then the upper bound is clearly attained by setting \(\hat{\mathbf {Q}}= \mathbf {U}\). This leads to \(\hat{\mathbf {Q}}=\mathbf {Q}\mathbf {V}= \mathbf {U}\) and \(\mathbf {Q}\mathbf {V}\mathbf {V}^\top = \mathbf {U}\mathbf {V}^\top \). Hence we get \(\mathbf {Q}_* = \mathbf {U}\mathbf {V}^\top .\)
\(\square \)
Remark 1
Note that the problem in (5) can be considered as a special case of the Orthogonal Procrustes Problem (OPP) (Schonemann, 1966) in which \(\mathbf {Q}\) is a square orthogonal rotation matrix (i.e \(\mathbf {Q}^\top \mathbf {Q}=\mathbf {Q}\mathbf {Q}^\top = \mathbf {I}\)). Hence, the difference between the problem in Theorem 1 and OPP is the constraint.
Rights and permissions
About this article
Cite this article
Labiod, L., Nadif, M. Efficient regularized spectral data embedding. Adv Data Anal Classif 15, 99–119 (2021). https://doi.org/10.1007/s11634-020-00386-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-020-00386-8