Skip to main content
Log in

Deep embedded self-organizing maps for joint representation learning and topology-preserving clustering

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

A recent research area in unsupervised learning is the combination of representation learning with deep neural networks and data clustering. The success of deep learning for supervised tasks is widely established. However, recent research has demonstrated how neural networks are able to learn representations to improve clustering in their intermediate feature space, using specific regularizations. By considering representation learning and clustering as a joint task, models learn clustering-friendly spaces and outperform two-stage approaches where dimensionality reduction and clustering are performed separately. Recently, this idea has been extended to topology-preserving clustering models, known as self-organizing maps (SOM). This work is a thorough study on the deep embedded self-organizing map (DESOM), a model composed of an autoencoder and a SOM layer, training jointly the code vectors and network weights to learn SOM-friendly representations. In other words, SOM induces a form a regularization to improve the quality of quantization and topology in latent space. After detailing the architecture, loss and training algorithm, we study hyperparameters with a series of experiments. Different SOM-based models are evaluated in terms of clustering, visualization and classification on benchmark datasets. We study benefits and trade-offs of joint representation learning and self-organization. DESOM achieves competitive results, requires no pretraining and produces topologically organized visualizations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://github.com/FlorentF9/DESOM.

  2. https://github.com/FlorentF9/SOMperf.

  3. https://github.com/FlorentF9/DESOM.

  4. https://github.com/FlorentF9/SOMperf.

  5. https://github.com/JustGlowing/minisom.

  6. https://github.com/FlorentF9/SOMperf.

References

  1. Aljalbout E, Golkov V, Siddiqui Y, and Cremers D (2018) Clustering with Deep Learning: Taxonomy and New Methods. arXiv:1801.07648

  2. Arpit D, Zhou Y, Ngo H, Govindaraju V (2016) Why regularized auto-encoders learn sparse representation?. In: International Conference on Machine Learning (ICML) 1:211–223

  3. Bauer HU, Pawelzik K, Geisel T (1992) A topographic product for the optimization of self-organizing feature maps. NIPS 4:1141–1147

    Google Scholar 

  4. Carniel R, Jolly AD, Barbui L (2013) Analysis of phreatic events at Ruapehu volcano, New Zealand using a new SOM approach. J Volcanol Geotherm Res 254:69–79. https://doi.org/10.1016/j.jvolgeores.2012.12.026

    Article  Google Scholar 

  5. Côme E, Cottrell M, Verleysen M, Lacaille J (2011) Aircraft engine fleet monitoring using Self-Organizing Maps and Edit Distance. In: International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), pp 298–307

  6. Devries T and GW Taylor (2017) Dataset augmentation in feature space. In ICLR Workshop

  7. Diday E, Simon JC (1976) Clustering analysis. Springer, Berlin, Heidelberg, pp 47–94. https://doi.org/10.1007/978-3-642-96303-2_3

    Book  MATH  Google Scholar 

  8. Dilokthanakul N, Mediano PA, Garnelo M, Lee MC, Salimbeni H, Arulkumaran K, and Shanahan M. (2017) Deep unsupervised clustering with Gaussian mixture variational autoencoders

  9. Ghasedi Dizaji K, Herandi A, Deng C, Cai W, and Huang H (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: ICCV, pp. 5747–5756

  10. Elend L, Kramer O (2019) Self-organizing maps with convolutional layers. In: International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM)

  11. Fard MM, Thonet T, and Gaussier E (2018) Deep k-Means: jointly clustering with k-means and learning representations. arXiv:1806.10069

  12. Faure C, Olteanu M, Bardet JM, Lacaille J (2017) Using self-organizing maps for clustering and labelling aircraft engine data phases. In: international workshop on self-organizing maps and learning vector quantization, clustering and data visualization (WSOM)

  13. Ferles C, Papanikolaou Y, Naidoo KJ (2018) Denoising autoencoder self-organizing map (DASOM). Neural Netw 105:112–131. https://doi.org/10.1016/j.neunet.2018.04.016

    Article  Google Scholar 

  14. Forest F, Cochard Q, Noyer C, Cabut A, Joncour M, Lacaille J, Lebbah M, and Azzag H (2020) Large-scale vibration monitoring of aircraft engines from operational data using self-organized Models. In: annual conference of the PHM society

  15. Forest F, Lacaille J, Lebbah M, and Azzag H.(2018) A generic and scalable pipeline for large-scale analytics of continuous aircraft engine data. In: IEEE international conference on big data

  16. Forest F, Lebbah M, Azzag H, and Lacaille J (2019) Deep architectures for joint clustering and visualization with self-organizing maps. In: PAKDD workshop on learning data representations for clustering (LDRC)

  17. Florent F, Lebbah M, Hanane A, Lacaille J (2019) Deep embedded SOM: joint representation learning and self-organization. In: European symposium on artificial neural networks, computational intelligence and machine learning (ESANN)

  18. Florent F, Lebbah M, Hanane A, and Lacaille J (2020) A survey and implementation of performance metrics for self-organized maps. arXiv:2011.05847

  19. Fortuin V, Hüser M, Locatello F, Strathmann H, Rätsch G (2019) SOM-VAE: interpretable discrete representation learning on time series

  20. Guo X, L Gao, X Liu, and J Yin (2017) Improved deep embedded clustering with local structure preservation. In: international joint conference on artificial intelligence (IJCAI), pp. 1753–1759

  21. X Guo, X Liu, E Zhu, and J Yin (2017) Deep clustering with convolutional autoencoders. In: ICONIP

  22. W Harchaoui, PA Mattei, A Alamansa, and C Bouveyron (2018) Wasserstein adversarial mixture clustering. https://hal.archives-ouvertes.fr/hal-01827775/

  23. Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    Article  MathSciNet  Google Scholar 

  24. Z Jiang, Y Zheng, H Tan, B Tang, and H Zhou (2017) Variational deep embedding : an unsupervised and generative approach to clustering. In: international joint conference on artificial intelligence (IJCAI), pp. 1965–1972

  25. Kaski S, Lagus K (1996) Comparing self-organizing maps

  26. DP. Kingma and M. Welling (2014) Stochastic gradient VB and the variational auto-encoder. In: international conference on learning representations (ICLR) arXiv:1312.6114

  27. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69

    Article  MathSciNet  Google Scholar 

  28. Kohonen Teuvo (1990) The self-organizing map. Proc IEEE 78:1464–1480

    Article  Google Scholar 

  29. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  30. Lewis DD, Yang Y, Rose TG, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397

    Google Scholar 

  31. Liu Z, Cao J, Chen S, Lu Y, Tan F (2020) Visualization analysis of seismic facies based on deep embedded SOM. IEEE Geosci Remote Sens Lett 18(8):1491–1495

    Article  Google Scholar 

  32. Q Ma, J Zheng, S Li, and GW Cottrell (2019) Learning representations for time series clustering. In: NeurIPS

  33. P Madaan and A Maiti (2019) Deep mean shift clustering. PhD thesis, Indraprastha Institute of Information Technology

  34. L Manduchi, M Hüser, G Rätsch, and V Fortuin (2020) DPSOM: deep probabilistic clustering with self-organizing maps. arXiv:1910.01590

  35. L McInnes, J Healy, and J Melville (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426

  36. HR Medeiros, PHM Braga, and HF Bassani (2020) Deep clustering self-organizing maps with relevance learning. In: ICML LatinX in AI Research Workshop

  37. Min E, Guo X, Liu Q, Zhang G, Cui J, Long J (2018) A survey of clustering with deep learning: from the perspective of network architecture. IEEE Access 6:39501–39514

    Article  Google Scholar 

  38. S Mukherjee, H Asnani, E Lin, and S Kannan (2019) ClusterGAN: latent space clustering in generative adversarial networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, 33:4610–4617. https://aaai.org/ojs/index.php/AAAI/article/view/4385

  39. Andrew Ng (2011) Sparse autoencoder. Technical report, Stanford University, https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf

  40. M Pesteie, P Abolmaesumi, and R Rohling (2018) Deep neural maps. In: ICML workshop. arXiv:1810.07291

  41. DJ Rezende, S Mohamed, and D Wierstra (2014) Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning (ICML), 4:3057–3070

  42. Song C, Huang Y, Liu F, Wang Z, Wang L (2014) Deep auto-encoder based clustering. Intell Data Anal 18(6):65–76

    Article  Google Scholar 

  43. Ullah A, Haydarov K, Haq IUI, Muhammad K, Rho S, Lee M, Baik SW (2020) Deep learning assisted buildings energy consumption profiling using smart meter data. Sensors 20(3):873

    Article  Google Scholar 

  44. A van den Oord, O Vinyals, and K Kavukcuoglu (2017) Neural discrete representation learning. In: NIPS. arXiv:1711.00937

  45. T. Villmann, M. Biehl, A. Villmann, and S. Saralajew (2017) Fusion of deep learning architectures, multilayer feedforward networks and learning vector quantizers for deep classification learning. In: international workshop on self-organizing maps and learning vector quantization, clustering and data visualization (WSOM)

  46. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  47. H Wu and M Flierl (2020) Vector quantization-based regularization for autoencoders. In: Proceedings of the AAAI conference on artificial intelligence, arXiv:1905.11062

  48. H Xiao, K Rasul, and R Vollgraf (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747

  49. J Xie, R Girshick, and A Farhadi (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning (ICML), vol. 48, arXiv:1511.06335

  50. B Yang, X Fu, ND Sidiropoulos, and M Hong (2017) Towards K-means-friendly spaces: simultaneous deep learning and clustering. In: International Conference on Machine Learning (ICML), arXiv:1610.04794

  51. D Zhu, T Han, L Zhou, X Yang, and YN Wu (2019) Deep unsupervised clustering with clustered generator model. arXiv:1911.08459

Download references

Acknowledgements

This research was funded by the French agency for research and technology (ANRT) through the CIFRE Grant 2017/1279 and by Safran Aircraft Engines (Safran group).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florent Forest.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: performance evaluation

Quantitative evaluation of self-organizing maps is not as straightforward as for supervised classification tasks. To assess and compare the performance of models, we implemented and evaluated a collection of metrics that have been developed in related literature. This literature spans almost 30 years of history, and implementations are not easy to find. Thus, we implemented these metrics and provide them as an open-source library, SOMperfFootnote 6 [18]. First, SOM performance metrics can be categorized into two families:

  1. 1.

    Clustering metrics. Any clustering quality measure that relies solely on the prototype vectors and not on their topological organization. This encompasses all quality indices used in clustering literature (e.g., purity, normalized mutual information (NMI), Rand index, etc.).

  2. 2.

    Topographic metrics. Under this term, we coin quality measures that, on the contrary, assess the topological organization of the model. Some indices also evaluate the clustering quality, but we call a metrics topographic as soon as it incorporates the map topology.

On another level, we can also classify them into two well-known families, depending on the use or not of ground-truth label knowledge:

  1. 1.

    Internal indices, using only intrinsic properties of the model and the data.

  2. 2.

    External indices, relying on external ground-truth class labels to evaluate results, as in supervised classification.

For instance, quantization error falls into the clustering metric category (as it measures how the SOM cluster centers fit the data distribution, without using any topology information) and is an internal indices (not depending on external labels). On the other side, the Class Scatter Index (introduced in [10]) is a topographic metric and an external indices, as it measures how ground-truth class labels are organized into groups of neighboring map units.

1.1 Clustering metrics

1.1.1 Internal indices

Quantization error Quantization error is the average error made by projecting data on the SOM, as measured by Euclidean distance, i.e., the mean Euclidean distance between a data sample and its best-matching unit. It can be measured in the original space, using the prototypes reconstructed by the decoder (\(\text {QE}\)), or in latent space (\(\hat{\text {QE}}\)), introducing the notations \({\tilde{b}}_i\) for the best-matching unit of data point \({\mathbf {x}}_i\) in original space, and \(b_i\) for the best-matching unit of \({\mathbf {z}}_i\) in latent space.

$$\begin{aligned}&\text {QE}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{1}{N} \sum _{i=1}^N ||{\mathbf {x}}_i - \tilde{{\mathbf {m}}}_{b_i}||_2\\&\hat{\text {QE}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}) = \frac{1}{N} \sum _{i=1}^N ||{\mathbf {z}}_i - {\mathbf {m}}_{b_i}||_2 \end{aligned}$$

Prototype sharpness ratio In the case of images, we want the prototype images to be realistic and as sharp as the original images. When visualizing a self-organized map of an image dataset, we usually simply visualize the image corresponding to each prototype vector. If they are blurry (due to the averaging induced by the SOM algorithm), the visualization will be of poor quality, even if the quantization error is low (because QE is only an average Euclidean distance). We have chosen a very simplistic way to compute the sharpness of an image, which is the average norm of pixel gradients, measured in two dimensions. The sharpness of a SOM is defined as the average sharpness of its prototypes. This sharpness measure can then be compared with the average sharpness of images in the original dataset. We introduce the prototype sharpness ratio (PSR), defined as follows:

$$\begin{aligned} \text {PSR}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{\text {average prototype sharpness}}{\text {average dataset sharpness}} = \frac{\frac{1}{K} \sum _{k=1}^K ||\nabla _{\text {2D}}\tilde{{\mathbf {m}}}_k||_2^2}{\frac{1}{N} \sum _{i=1}^N ||\nabla _{\text {2D}}{\mathbf {x}}_i||_2^2} \end{aligned}$$

A score lower than 1 means that the prototypes are in average blurrier than the original images; on the contrary, if it is larger than 1, they are less blurry (i.e., more crisp or noisy) than the originals. The closer the PSR is to 1, the better is the score.

1.1.2 External indices

A clustering with K clusters is described by the sets of data points belonging to each cluster, noted \({\mathbf {Q}} = \{Q_k\}, k = 1 \ldots K\). In order to define the external clustering criteria, we assume that labels are associated with each data point, corresponding to a set of C different classes. We note \({\mathbf {Y}} = \{Y_j\}, j = 1 \ldots C\) the sets of elements belonging to each class.

Purity Purity is one of the most commonly used external quality indices. It measures the purity of clusters with respect to ground-truth class labels. To compute the purity of a clustering \({\mathbf {Q}}\), each cluster is assigned to the class which is most frequent in the cluster, and then the accuracy of this assignment is measured by counting the number of correctly assigned points and dividing by the total number of points. Formally:

$$\begin{aligned} \text {Pur}({\mathbf {Q}}, {\mathbf {Y}}) = \frac{1}{N} \sum _{k=1}^K \underset{j = 1 \ldots C}{\max } |Q_k \cap Y_j| \end{aligned}$$
(9)

High purity is easy to achieve when the number of data points per cluster is small; in particular, purity is equal to 1 if all points get their own cluster. Thus, purity cannot be used to trade off the validity of the clustering against the number of clusters.

Normalized mutual information Normalized mutual information (NMI) is also one of the most widespread external clustering indices:

$$\begin{aligned} \text {NMI}({\mathbf {Q}}, {\mathbf {Y}}) = \frac{I({\mathbf {Q}}, {\mathbf {Y}})}{\frac{1}{2}\left( H({\mathbf {Q}}) + H({\mathbf {Y}})\right) } \end{aligned}$$

where I is mutual information:

$$\begin{aligned} I({\mathbf {Q}}, {\mathbf {Y}}) = \sum _k \sum _j P(Q_k \cap Y_j) \log \frac{P(Q_k \cap Y_j)}{P(Q_k)P(Y_j)} = \sum _k \sum _j \frac{|Q_k \cap Y_j|}{N} \log \frac{N|Q_k \cap Y_j|}{|Q_k||Y_j|} \end{aligned}$$

and H is entropy:

$$\begin{aligned} H({\mathbf {Q}}) = -\sum _k P(Q_k) \log P(Q_k) = -\sum _k \frac{|Q_k|}{N} \log \frac{|Q_k|}{N} \end{aligned}$$

Unsupervised clustering accuracy Accuracy is the most common metric used in supervised classification: it corresponds to the number of examples that are assigned to the correct class divided by the total number of examples in the dataset. It can also be used in clustering (i.e., unsupervised classification) as an external quality measure if labels are available and if the number of clusters is equal to the number of classes. It consists in the accuracy of the resulting classification using the best one-to-one mapping between clusters and class labels. Denoting this mapping by m, the expression of unsupervised clustering accuracy is:

$$\begin{aligned} \text {Acc}({\mathbf {Q}}, {\mathbf {Y}}) = \frac{1}{N} \underset{m}{\max }\sum _{k=1}^K |Q_k \cap Y_{m(k)}| \end{aligned}$$
(10)

The best mapping can be found using the Hungarian assignment algorithm, also known as the Kuhn–Munkres algorithm.

1.2 Topographic metrics

Distortion Distortion is the loss function minimized by the SOM learning algorithm. It is similar to the within-cluster sum of squared errors (WSSE) minimized by k-means, but with an additional topology constraint introduced by the neighborhood function. It is calculated by the sum of squared Euclidean distances between each map prototype and data sample, weighted by the neighborhood function according to the distance between the sample and the prototype. As quantization error, it can be measured in the original data space or in latent space:

$$\begin{aligned}&\text {D}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}, T) = \frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K {\mathcal {K}}^T\left( \delta ({\tilde{b}}_i, k)\right) ||{\mathbf {x}}_i - \tilde{{\mathbf {m}}}_k||_2^2\\&\hat{\text {D}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}, T) = \frac{1}{N} \sum _{i=1}^N \sum _{k=1}^K {\mathcal {K}}^T\left( \delta (b_i, k)\right) ||{\mathbf {z}}_i - {\mathbf {m}}_k||_2^2 \end{aligned}$$

In DESOM, the SOM loss corresponds to latent distortion.

Topographic error Topographic error assesses the self-organization of a SOM model. It is calculated as the fraction of samples whose best and second-best matching units are not neighbors on the map. In other words, this error quantifies the smoothness of projections on the self-organized map. Using the notations \({\tilde{b}}^k_i\) and \(b^k_i\) for the k-th best-matching units of \({\mathbf {x}}_i\) and \({\mathbf {z}}_i\) in original and latent space, respectively, we define topographic error \(\text {TE}\) and latent topographic error \(\hat{\text {TE}}\):

$$\begin{aligned}&\text {TE}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{1}{N} \sum _{i=1}^N \mathbb {1}_{\delta ({\tilde{b}}^1_i, {\tilde{b}}^2_i)> 1}\\&\hat{\text {TE}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}) = \frac{1}{N} \sum _{i=1}^N \mathbb {1}_{\delta (b^1_i, b^2_i) > 1} \end{aligned}$$

Combined error Combined error [25] is an error measure that combines and extends quantification and topographic errors. Its computation is more complex than the previous indices. For a given data sample \({\mathbf {x}}_i\), we first compute its two best matching units \({\tilde{b}}^1_i\) and \({\tilde{b}}^2_i\). Then, we compute a sum of Euclidean distances from \({\mathbf {x}}_i\) to the second BMU’s prototype vector \({\mathbf {m}}_{{\tilde{b}}^2_i}\), starting with the distance from \({\mathbf {x}}_i\) to \({\mathbf {m}}_{{\tilde{b}}^1_i}\), and thereafter following a shortest path until \({\mathbf {m}}_{{\tilde{b}}^2_i}\), going through only neighboring units on the map. Finally, the combined error (CE) is the average of this distance over the input samples.

Let p be a path on the map of length \(P \ge 1\), from \(p(0) = {\tilde{b}}^1_i\) to \(p(L) = {\tilde{b}}^2_i\), such that p(k) and \(p(k+1)\) must be neighbors for \(k = 0 \ldots P-1\). The distance along the shortest path on the map is computed as:

$$\begin{aligned} \text {CE}_i = ||{\mathbf {x}}_i - \tilde{{\mathbf {m}}}_{{\tilde{b}}^1_i}||_2^2 + \underset{p}{\min } \sum _{k=0}^{P-1} ||\tilde{{\mathbf {m}}}_{p(k+1)} - \tilde{{\mathbf {m}}}_{p(k)}||_2^2 \end{aligned}$$

Finally, the combined error is:

$$\begin{aligned} \text {CE}(\{\tilde{{\mathbf {m}}}_k\}, {\mathbb {X}}) = \frac{1}{N} \sum _{i=1}^N \text {CE}_i \end{aligned}$$

As usual, we also define combined error in latent space:

$$\begin{aligned}&\hat{\text {CE}}_i = ||{\mathbf {z}}_i - {\mathbf {m}}_{b^1_i}||_2^2 + \underset{p}{\min } \sum _{k=0}^{L-1} ||{\mathbf {m}}_{p(k+1)} - {\mathbf {m}}_{p(k)}||_2^2\\&\hat{\text {CE}}(\{{\mathbf {m}}_k\}, {\mathbb {Z}}) = \frac{1}{N} \sum _{i=1}^N \hat{\text {CE}}_i \end{aligned}$$

Topographic product The topographic product (TP) [3] measures the preservation of neighborhood relations between vector space and the map. It depends only on the prototype vectors and map topology, and is able to indicate whether the dimension of the map is appropriate to fit the dataset, or if it introduced neighborhood violations.

We will note d the Euclidean distance in vector space, and \(\delta\) the topographic distance on the map. The computation of TP starts by defining two ratios between the distance of a prototype j to its k-th nearest neighbor on the map \(n_k^{\delta }(j)\), and to its k-th nearest neighbor in vector space \(n_k^d(j)\):

$$\begin{aligned} Q_1(j, k) = \frac{d\left( \tilde{{\mathbf {m}}}_j, \tilde{{\mathbf {m}}}_{n_k^{\delta }(j)} \right) }{d\left( \tilde{{\mathbf {m}}}_j, \tilde{{\mathbf {m}}}_{n_k^d(j)} \right) }, \; Q_2(j, k) = \frac{\delta \left( j, n_k^{\delta }(j)\right) }{\delta \left( j, n_k^d(j)\right) } \end{aligned}$$

Naturally, we always have \(Q_1 \ge 1\) and \(Q_2 \le 1\). The ratios are combined into a product in order to obtain a symmetric measure and mitigate local magnification factors:

$$\begin{aligned} P_3(j, k) = \left[ \prod _{l=1}^k Q_1(j, l) Q_2(j, l) \right] ^{\frac{1}{2k}} \end{aligned}$$

Finally, the topographic product is obtained by taking the logarithm and averaging over all map units and neighborhood orders:

$$\begin{aligned} \text {TP} = \frac{1}{K(K-1)}\sum _{j=1}^K \sum _{k=1}^{K-1} \log P_3(j, k) \end{aligned}$$

\(\text {TP} < 0\) indicates the map dimension is too low to correctly represent the dataset; \(\text {TP} = 0\) means the dimension is adequate; and \(\text {TP} > 0\) indicates a dimension too high and neighborhood violations. A latent topographic product \(\hat{\text {TP}}\) can be defined in exactly the same manner, only replacing the prototypes \(\tilde{{\mathbf {m}}}_j\) by their latent counterparts \({\mathbf {m}}_j\).

Appendix 2: map visualizations

See Fig. 17.

Fig. 17
figure 17

20-by-20 DESOM maps of MNIST and Fashion-MNIST

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Forest, F., Lebbah, M., Azzag, H. et al. Deep embedded self-organizing maps for joint representation learning and topology-preserving clustering. Neural Comput & Applic 33, 17439–17469 (2021). https://doi.org/10.1007/s00521-021-06331-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06331-w

Keywords

Navigation