Deep embedded self-organizing maps for joint representation learning and topology-preserving clustering

Forest, Florent; Lebbah, Mustapha; Azzag, Hanene; Lacaille, Jérôme

doi:10.1007/s00521-021-06331-w

Deep embedded self-organizing maps for joint representation learning and topology-preserving clustering

Original Article
Published: 03 August 2021

Volume 33, pages 17439–17469, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Florent Forest ORCID: orcid.org/0000-0001-6878-8752^1,2,
Mustapha Lebbah¹,
Hanene Azzag¹ &
…
Jérôme Lacaille²

1168 Accesses
9 Citations
2 Altmetric
Explore all metrics

Abstract

A recent research area in unsupervised learning is the combination of representation learning with deep neural networks and data clustering. The success of deep learning for supervised tasks is widely established. However, recent research has demonstrated how neural networks are able to learn representations to improve clustering in their intermediate feature space, using specific regularizations. By considering representation learning and clustering as a joint task, models learn clustering-friendly spaces and outperform two-stage approaches where dimensionality reduction and clustering are performed separately. Recently, this idea has been extended to topology-preserving clustering models, known as self-organizing maps (SOM). This work is a thorough study on the deep embedded self-organizing map (DESOM), a model composed of an autoencoder and a SOM layer, training jointly the code vectors and network weights to learn SOM-friendly representations. In other words, SOM induces a form a regularization to improve the quality of quantization and topology in latent space. After detailing the architecture, loss and training algorithm, we study hyperparameters with a series of experiments. Different SOM-based models are evaluated in terms of clustering, visualization and classification on benchmark datasets. We study benefits and trade-offs of joint representation learning and self-organization. DESOM achieves competitive results, requires no pretraining and produces topologically organized visualizations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Siamese Neural Networks: An Overview

Autoencoders and their applications in machine learning: a survey

Article Open access 03 February 2024

Notes

References

Aljalbout E, Golkov V, Siddiqui Y, and Cremers D (2018) Clustering with Deep Learning: Taxonomy and New Methods. arXiv:1801.07648
Arpit D, Zhou Y, Ngo H, Govindaraju V (2016) Why regularized auto-encoders learn sparse representation?. In: International Conference on Machine Learning (ICML) 1:211–223
Bauer HU, Pawelzik K, Geisel T (1992) A topographic product for the optimization of self-organizing feature maps. NIPS 4:1141–1147
Google Scholar
Carniel R, Jolly AD, Barbui L (2013) Analysis of phreatic events at Ruapehu volcano, New Zealand using a new SOM approach. J Volcanol Geotherm Res 254:69–79. https://doi.org/10.1016/j.jvolgeores.2012.12.026
Article Google Scholar
Côme E, Cottrell M, Verleysen M, Lacaille J (2011) Aircraft engine fleet monitoring using Self-Organizing Maps and Edit Distance. In: International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), pp 298–307
Devries T and GW Taylor (2017) Dataset augmentation in feature space. In ICLR Workshop
Diday E, Simon JC (1976) Clustering analysis. Springer, Berlin, Heidelberg, pp 47–94. https://doi.org/10.1007/978-3-642-96303-2_3
Book MATH Google Scholar
Dilokthanakul N, Mediano PA, Garnelo M, Lee MC, Salimbeni H, Arulkumaran K, and Shanahan M. (2017) Deep unsupervised clustering with Gaussian mixture variational autoencoders
Ghasedi Dizaji K, Herandi A, Deng C, Cai W, and Huang H (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: ICCV, pp. 5747–5756
Elend L, Kramer O (2019) Self-organizing maps with convolutional layers. In: International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM)
Fard MM, Thonet T, and Gaussier E (2018) Deep k-Means: jointly clustering with k-means and learning representations. arXiv:1806.10069
Faure C, Olteanu M, Bardet JM, Lacaille J (2017) Using self-organizing maps for clustering and labelling aircraft engine data phases. In: international workshop on self-organizing maps and learning vector quantization, clustering and data visualization (WSOM)
Ferles C, Papanikolaou Y, Naidoo KJ (2018) Denoising autoencoder self-organizing map (DASOM). Neural Netw 105:112–131. https://doi.org/10.1016/j.neunet.2018.04.016
Article Google Scholar
Forest F, Cochard Q, Noyer C, Cabut A, Joncour M, Lacaille J, Lebbah M, and Azzag H (2020) Large-scale vibration monitoring of aircraft engines from operational data using self-organized Models. In: annual conference of the PHM society
Forest F, Lacaille J, Lebbah M, and Azzag H.(2018) A generic and scalable pipeline for large-scale analytics of continuous aircraft engine data. In: IEEE international conference on big data
Forest F, Lebbah M, Azzag H, and Lacaille J (2019) Deep architectures for joint clustering and visualization with self-organizing maps. In: PAKDD workshop on learning data representations for clustering (LDRC)
Florent F, Lebbah M, Hanane A, Lacaille J (2019) Deep embedded SOM: joint representation learning and self-organization. In: European symposium on artificial neural networks, computational intelligence and machine learning (ESANN)
Florent F, Lebbah M, Hanane A, and Lacaille J (2020) A survey and implementation of performance metrics for self-organized maps. arXiv:2011.05847
Fortuin V, Hüser M, Locatello F, Strathmann H, Rätsch G (2019) SOM-VAE: interpretable discrete representation learning on time series
Guo X, L Gao, X Liu, and J Yin (2017) Improved deep embedded clustering with local structure preservation. In: international joint conference on artificial intelligence (IJCAI), pp. 1753–1759
X Guo, X Liu, E Zhu, and J Yin (2017) Deep clustering with convolutional autoencoders. In: ICONIP
W Harchaoui, PA Mattei, A Alamansa, and C Bouveyron (2018) Wasserstein adversarial mixture clustering. https://hal.archives-ouvertes.fr/hal-01827775/
Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Article MathSciNet Google Scholar
Z Jiang, Y Zheng, H Tan, B Tang, and H Zhou (2017) Variational deep embedding : an unsupervised and generative approach to clustering. In: international joint conference on artificial intelligence (IJCAI), pp. 1965–1972
Kaski S, Lagus K (1996) Comparing self-organizing maps
DP. Kingma and M. Welling (2014) Stochastic gradient VB and the variational auto-encoder. In: international conference on learning representations (ICLR) arXiv:1312.6114
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Article MathSciNet Google Scholar
Kohonen Teuvo (1990) The self-organizing map. Proc IEEE 78:1464–1480
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lewis DD, Yang Y, Rose TG, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Liu Z, Cao J, Chen S, Lu Y, Tan F (2020) Visualization analysis of seismic facies based on deep embedded SOM. IEEE Geosci Remote Sens Lett 18(8):1491–1495
Article Google Scholar
Q Ma, J Zheng, S Li, and GW Cottrell (2019) Learning representations for time series clustering. In: NeurIPS
P Madaan and A Maiti (2019) Deep mean shift clustering. PhD thesis, Indraprastha Institute of Information Technology
L Manduchi, M Hüser, G Rätsch, and V Fortuin (2020) DPSOM: deep probabilistic clustering with self-organizing maps. arXiv:1910.01590
L McInnes, J Healy, and J Melville (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426
HR Medeiros, PHM Braga, and HF Bassani (2020) Deep clustering self-organizing maps with relevance learning. In: ICML LatinX in AI Research Workshop
Min E, Guo X, Liu Q, Zhang G, Cui J, Long J (2018) A survey of clustering with deep learning: from the perspective of network architecture. IEEE Access 6:39501–39514
Article Google Scholar
S Mukherjee, H Asnani, E Lin, and S Kannan (2019) ClusterGAN: latent space clustering in generative adversarial networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, 33:4610–4617. https://aaai.org/ojs/index.php/AAAI/article/view/4385
Andrew Ng (2011) Sparse autoencoder. Technical report, Stanford University, https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf
M Pesteie, P Abolmaesumi, and R Rohling (2018) Deep neural maps. In: ICML workshop. arXiv:1810.07291
DJ Rezende, S Mohamed, and D Wierstra (2014) Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning (ICML), 4:3057–3070
Song C, Huang Y, Liu F, Wang Z, Wang L (2014) Deep auto-encoder based clustering. Intell Data Anal 18(6):65–76
Article Google Scholar
Ullah A, Haydarov K, Haq IUI, Muhammad K, Rho S, Lee M, Baik SW (2020) Deep learning assisted buildings energy consumption profiling using smart meter data. Sensors 20(3):873
Article Google Scholar
A van den Oord, O Vinyals, and K Kavukcuoglu (2017) Neural discrete representation learning. In: NIPS. arXiv:1711.00937
T. Villmann, M. Biehl, A. Villmann, and S. Saralajew (2017) Fusion of deep learning architectures, multilayer feedforward networks and learning vector quantizers for deep classification learning. In: international workshop on self-organizing maps and learning vector quantization, clustering and data visualization (WSOM)
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
H Wu and M Flierl (2020) Vector quantization-based regularization for autoencoders. In: Proceedings of the AAAI conference on artificial intelligence, arXiv:1905.11062
H Xiao, K Rasul, and R Vollgraf (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
J Xie, R Girshick, and A Farhadi (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning (ICML), vol. 48, arXiv:1511.06335
B Yang, X Fu, ND Sidiropoulos, and M Hong (2017) Towards K-means-friendly spaces: simultaneous deep learning and clustering. In: International Conference on Machine Learning (ICML), arXiv:1610.04794
D Zhu, T Han, L Zhou, X Yang, and YN Wu (2019) Deep unsupervised clustering with clustered generator model. arXiv:1911.08459

Download references

Acknowledgements

This research was funded by the French agency for research and technology (ANRT) through the CIFRE Grant 2017/1279 and by Safran Aircraft Engines (Safran group).

Author information

Authors and Affiliations

LIPN (CNRS UMR 7030), Université Sorbonne Paris Nord, 99 Avenue Jean-Baptiste Clément, 93430, Villetaneuse, France
Florent Forest, Mustapha Lebbah & Hanene Azzag
Safran Aircraft Engines, Rond-point René Ravaud, 77550, Moissy-Cramayel, France
Florent Forest & Jérôme Lacaille

Authors

Florent Forest
View author publications
You can also search for this author in PubMed Google Scholar
Mustapha Lebbah
View author publications
You can also search for this author in PubMed Google Scholar
Hanene Azzag
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Lacaille
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florent Forest.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: performance evaluation

Quantitative evaluation of self-organizing maps is not as straightforward as for supervised classification tasks. To assess and compare the performance of models, we implemented and evaluated a collection of metrics that have been developed in related literature. This literature spans almost 30 years of history, and implementations are not easy to find. Thus, we implemented these metrics and provide them as an open-source library, SOMperf^{Footnote 6} [18]. First, SOM performance metrics can be categorized into two families:

1.
Clustering metrics. Any clustering quality measure that relies solely on the prototype vectors and not on their topological organization. This encompasses all quality indices used in clustering literature (e.g., purity, normalized mutual information (NMI), Rand index, etc.).
2.
Topographic metrics. Under this term, we coin quality measures that, on the contrary, assess the topological organization of the model. Some indices also evaluate the clustering quality, but we call a metrics topographic as soon as it incorporates the map topology.

On another level, we can also classify them into two well-known families, depending on the use or not of ground-truth label knowledge:

1.
Internal indices, using only intrinsic properties of the model and the data.
2.
External indices, relying on external ground-truth class labels to evaluate results, as in supervised classification.

For instance, quantization error falls into the clustering metric category (as it measures how the SOM cluster centers fit the data distribution, without using any topology information) and is an internal indices (not depending on external labels). On the other side, the Class Scatter Index (introduced in [10]) is a topographic metric and an external indices, as it measures how ground-truth class labels are organized into groups of neighboring map units.

1.1 Clustering metrics

1.1.1 Internal indices

Quantization error Quantization error is the average error made by projecting data on the SOM, as measured by Euclidean distance, i.e., the mean Euclidean distance between a data sample and its best-matching unit. It can be measured in the original space, using the prototypes reconstructed by the decoder ($\text {QE}$), or in latent space ($\hat{\text {QE}}$), introducing the notations ${\tilde{b}}_i$ for the best-matching unit of data point ${\mathbf {x}}_i$ in original space, and $b_i$ for the best-matching unit of ${\mathbf {z}}_i$ in latent space.

$$\begin{aligned}&\text {QE}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{1}{N} \sum _{i=1}^N ||{\mathbf {x}}_i - \tilde{{\mathbf {m}}}_{b_i}||_2\\&\hat{\text {QE}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}) = \frac{1}{N} \sum _{i=1}^N ||{\mathbf {z}}_i - {\mathbf {m}}_{b_i}||_2 \end{aligned}$$

Prototype sharpness ratio In the case of images, we want the prototype images to be realistic and as sharp as the original images. When visualizing a self-organized map of an image dataset, we usually simply visualize the image corresponding to each prototype vector. If they are blurry (due to the averaging induced by the SOM algorithm), the visualization will be of poor quality, even if the quantization error is low (because QE is only an average Euclidean distance). We have chosen a very simplistic way to compute the sharpness of an image, which is the average norm of pixel gradients, measured in two dimensions. The sharpness of a SOM is defined as the average sharpness of its prototypes. This sharpness measure can then be compared with the average sharpness of images in the original dataset. We introduce the prototype sharpness ratio (PSR), defined as follows:

$$\begin{aligned} \text {PSR}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{\text {average prototype sharpness}}{\text {average dataset sharpness}} = \frac{\frac{1}{K} \sum _{k=1}^K ||\nabla _{\text {2D}}\tilde{{\mathbf {m}}}_k||_2^2}{\frac{1}{N} \sum _{i=1}^N ||\nabla _{\text {2D}}{\mathbf {x}}_i||_2^2} \end{aligned}$$

A score lower than 1 means that the prototypes are in average blurrier than the original images; on the contrary, if it is larger than 1, they are less blurry (i.e., more crisp or noisy) than the originals. The closer the PSR is to 1, the better is the score.

1.1.2 External indices

A clustering with K clusters is described by the sets of data points belonging to each cluster, noted ${\mathbf {Q}} = \{Q_k\}, k = 1 \ldots K$. In order to define the external clustering criteria, we assume that labels are associated with each data point, corresponding to a set of C different classes. We note ${\mathbf {Y}} = \{Y_j\}, j = 1 \ldots C$ the sets of elements belonging to each class.

Purity Purity is one of the most commonly used external quality indices. It measures the purity of clusters with respect to ground-truth class labels. To compute the purity of a clustering ${\mathbf {Q}}$, each cluster is assigned to the class which is most frequent in the cluster, and then the accuracy of this assignment is measured by counting the number of correctly assigned points and dividing by the total number of points. Formally:

$$\begin{aligned} \text {Pur}({\mathbf {Q}}, {\mathbf {Y}}) = \frac{1}{N} \sum _{k=1}^K \underset{j = 1 \ldots C}{\max } |Q_k \cap Y_j| \end{aligned}$$

(9)

High purity is easy to achieve when the number of data points per cluster is small; in particular, purity is equal to 1 if all points get their own cluster. Thus, purity cannot be used to trade off the validity of the clustering against the number of clusters.

Normalized mutual information Normalized mutual information (NMI) is also one of the most widespread external clustering indices:

$$\begin{aligned} \text {NMI}({\mathbf {Q}}, {\mathbf {Y}}) = \frac{I({\mathbf {Q}}, {\mathbf {Y}})}{\frac{1}{2}\left( H({\mathbf {Q}}) + H({\mathbf {Y}})\right) } \end{aligned}$$

where I is mutual information:

$$\begin{aligned} I({\mathbf {Q}}, {\mathbf {Y}}) = \sum _k \sum _j P(Q_k \cap Y_j) \log \frac{P(Q_k \cap Y_j)}{P(Q_k)P(Y_j)} = \sum _k \sum _j \frac{|Q_k \cap Y_j|}{N} \log \frac{N|Q_k \cap Y_j|}{|Q_k||Y_j|} \end{aligned}$$

and H is entropy:

$$\begin{aligned} H({\mathbf {Q}}) = -\sum _k P(Q_k) \log P(Q_k) = -\sum _k \frac{|Q_k|}{N} \log \frac{|Q_k|}{N} \end{aligned}$$

Unsupervised clustering accuracy Accuracy is the most common metric used in supervised classification: it corresponds to the number of examples that are assigned to the correct class divided by the total number of examples in the dataset. It can also be used in clustering (i.e., unsupervised classification) as an external quality measure if labels are available and if the number of clusters is equal to the number of classes. It consists in the accuracy of the resulting classification using the best one-to-one mapping between clusters and class labels. Denoting this mapping by m, the expression of unsupervised clustering accuracy is:

$$\begin{aligned} \text {Acc}({\mathbf {Q}}, {\mathbf {Y}}) = \frac{1}{N} \underset{m}{\max }\sum _{k=1}^K |Q_k \cap Y_{m(k)}| \end{aligned}$$

(10)

The best mapping can be found using the Hungarian assignment algorithm, also known as the Kuhn–Munkres algorithm.

1.2 Topographic metrics

Distortion Distortion is the loss function minimized by the SOM learning algorithm. It is similar to the within-cluster sum of squared errors (WSSE) minimized by k-means, but with an additional topology constraint introduced by the neighborhood function. It is calculated by the sum of squared Euclidean distances between each map prototype and data sample, weighted by the neighborhood function according to the distance between the sample and the prototype. As quantization error, it can be measured in the original data space or in latent space:

$$\begin{aligned}&\text {D}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}, T) = \frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K {\mathcal {K}}^T\left( \delta ({\tilde{b}}_i, k)\right) ||{\mathbf {x}}_i - \tilde{{\mathbf {m}}}_k||_2^2\\&\hat{\text {D}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}, T) = \frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K {\mathcal {K}}^T\left( \delta (b_i, k)\right) ||{\mathbf {z}}_i - {\mathbf {m}}_k||_2^2 \end{aligned}$$

In DESOM, the SOM loss corresponds to latent distortion.

Topographic error Topographic error assesses the self-organization of a SOM model. It is calculated as the fraction of samples whose best and second-best matching units are not neighbors on the map. In other words, this error quantifies the smoothness of projections on the self-organized map. Using the notations ${\tilde{b}}^k_i$ and $b^k_i$ for the k-th best-matching units of ${\mathbf {x}}_i$ and ${\mathbf {z}}_i$ in original and latent space, respectively, we define topographic error $\text {TE}$ and latent topographic error $\hat{\text {TE}}$:

$$\begin{aligned}&\text {TE}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{1}{N} \sum _{i=1}^N \mathbb {1}_{\delta ({\tilde{b}}^1_i, {\tilde{b}}^2_i)> 1}\\&\hat{\text {TE}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}) = \frac{1}{N} \sum _{i=1}^N \mathbb {1}_{\delta (b^1_i, b^2_i) > 1} \end{aligned}$$

Combined error Combined error [25] is an error measure that combines and extends quantification and topographic errors. Its computation is more complex than the previous indices. For a given data sample ${\mathbf {x}}_i$, we first compute its two best matching units ${\tilde{b}}^1_i$ and ${\tilde{b}}^2_i$. Then, we compute a sum of Euclidean distances from ${\mathbf {x}}_i$ to the second BMU’s prototype vector ${\mathbf {m}}_{{\tilde{b}}^2_i}$, starting with the distance from ${\mathbf {x}}_i$ to ${\mathbf {m}}_{{\tilde{b}}^1_i}$, and thereafter following a shortest path until ${\mathbf {m}}_{{\tilde{b}}^2_i}$, going through only neighboring units on the map. Finally, the combined error (CE) is the average of this distance over the input samples.

Let p be a path on the map of length $P \ge 1$, from $p(0) = {\tilde{b}}^1_i$ to $p(L) = {\tilde{b}}^2_i$, such that p(k) and $p(k+1)$ must be neighbors for $k = 0 \ldots P-1$. The distance along the shortest path on the map is computed as:

$$\begin{aligned} \text {CE}_i = ||{\mathbf {x}}_i - \tilde{{\mathbf {m}}}_{{\tilde{b}}^1_i}||_2^2 + \underset{p}{\min } \sum _{k=0}^{P-1} ||\tilde{{\mathbf {m}}}_{p(k+1)} - \tilde{{\mathbf {m}}}_{p(k)}||_2^2 \end{aligned}$$

Finally, the combined error is:

$$\begin{aligned} \text {CE}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{1}{N} \sum _{i=1}^N \text {CE}_i \end{aligned}$$

As usual, we also define combined error in latent space:

$$\begin{aligned}&\hat{\text {CE}}_i = ||{\mathbf {z}}_i - {\mathbf {m}}_{b^1_i}||_2^2 + \underset{p}{\min } \sum _{k=0}^{L-1} ||{\mathbf {m}}_{p(k+1)} - {\mathbf {m}}_{p(k)}||_2^2\\&\hat{\text {CE}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}) = \frac{1}{N} \sum _{i=1}^N \hat{\text {CE}}_i \end{aligned}$$

Topographic product The topographic product (TP) [3] measures the preservation of neighborhood relations between vector space and the map. It depends only on the prototype vectors and map topology, and is able to indicate whether the dimension of the map is appropriate to fit the dataset, or if it introduced neighborhood violations.

We will note d the Euclidean distance in vector space, and $\delta$ the topographic distance on the map. The computation of TP starts by defining two ratios between the distance of a prototype j to its k-th nearest neighbor on the map $n_k^{\delta }(j)$, and to its k-th nearest neighbor in vector space $n_k^d(j)$:

$$\begin{aligned} Q_1(j, k) = \frac{d\left( \tilde{{\mathbf {m}}}_j, \tilde{{\mathbf {m}}}_{n_k^{\delta }(j)} \right) }{d\left( \tilde{{\mathbf {m}}}_j, \tilde{{\mathbf {m}}}_{n_k^d(j)} \right) }, \; Q_2(j, k) = \frac{\delta \left( j, n_k^{\delta }(j)\right) }{\delta \left( j, n_k^d(j)\right) } \end{aligned}$$

Naturally, we always have $Q_1 \ge 1$ and $Q_2 \le 1$. The ratios are combined into a product in order to obtain a symmetric measure and mitigate local magnification factors:

$$\begin{aligned} P_3(j, k) = \left[ \prod _{l=1}^k Q_1(j, l) Q_2(j, l) \right] ^{\frac{1}{2k}} \end{aligned}$$

Finally, the topographic product is obtained by taking the logarithm and averaging over all map units and neighborhood orders:

$$\begin{aligned} \text {TP} = \frac{1}{K(K-1)}\sum _{j=1}^K \sum _{k=1}^{K-1} \log P_3(j, k) \end{aligned}$$

$\text {TP} < 0$ indicates the map dimension is too low to correctly represent the dataset; $\text {TP} = 0$ means the dimension is adequate; and $\text {TP} > 0$ indicates a dimension too high and neighborhood violations. A latent topographic product $\hat{\text {TP}}$ can be defined in exactly the same manner, only replacing the prototypes $\tilde{{\mathbf {m}}}_j$ by their latent counterparts ${\mathbf {m}}_j$.

Appendix 2: map visualizations

See Fig. 17.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forest, F., Lebbah, M., Azzag, H. et al. Deep embedded self-organizing maps for joint representation learning and topology-preserving clustering. Neural Comput & Applic 33, 17439–17469 (2021). https://doi.org/10.1007/s00521-021-06331-w

Download citation

Received: 23 January 2021
Accepted: 10 July 2021
Published: 03 August 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00521-021-06331-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep embedded self-organizing maps for joint representation learning and topology-preserving clustering

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Siamese Neural Networks: An Overview

Autoencoders and their applications in machine learning: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: performance evaluation

1.1 Clustering metrics

1.1.1 Internal indices

1.1.2 External indices

1.2 Topographic metrics

Appendix 2: map visualizations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep embedded self-organizing maps for joint representation learning and topology-preserving clustering

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Siamese Neural Networks: An Overview

Autoencoders and their applications in machine learning: a survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: performance evaluation

1.1 Clustering metrics

1.1.1 Internal indices

1.1.2 External indices

1.2 Topographic metrics

Appendix 2: map visualizations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation