Abstract
Unsupervised learning of discrete representations in neural networks (NNs) from continuous ones is essential for many modern applications. Vector Quantisation (VQ) has become popular for this, in particular in the context of generative models, such as Variational Auto-Encoders (VAEs), where the exponential moving average-based VQ (EMA-VQ) algorithm is often used. Here, we study an alternative VQ algorithm based on Kohonen’s learning rule for the Self-Organising Map (KSOM; 1982). EMA-VQ is a special case of KSOM. KSOM is known to offer two potential benefits: empirically, it converges faster than EMA-VQ, and KSOM-generated discrete representations form a topological structure on the grid whose nodes are the discrete symbols, resulting in an artificial version of the brain’s topographic map. We revisit these properties by using KSOM in VQ-VAEs for image processing. In our experiments, the speed-up compared to well-configured EMA-VQ is only observable at the beginning of training, but KSOM is generally much more robust, e.g., w.r.t. the choice of initialisation schemes (Our code is public: https://github.com/IDSIA/kohonen-vae. The full version with an appendix can be found at: https://arxiv.org/abs/2302.07950).
K. Irie and R. Csordás—Equal contribution. Work done at IDSIA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Here we use T as both the number of inputs and iterations.
- 2.
Further experimental details are available through the link on page 1.
- 3.
We also observed that KSOM generally improves the codebook utilisation. Further illustrations of this is available through the link provided on page 1.
- 4.
Empirical results comparing KSOM and SOM-VAE is provided through the link on page 1. We find that the final reconstruction loss is similar for the EMA baseline, SOM-VAE, and KSOM, but KSOM converges the fastest (significantly faster than SOM-VAE). We also confirm that the EMA-based codebook learning outperforms the gradient-based one as noted by the original authors of VQ-VAE. We focus on KSOM because it is the natural extension of EMA which is recommended over the gradient-based variant (SOM-VAE is a natural extension of the gradient-based variant).
References
Agustsson, E., et al.: Soft-to-hard vector quantization for end-to-end learning compressible representations. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1141–1151 (2017)
Amari, S.I.: Topographic organization of nerve fields. Bull. Math. Biol. 42(3), 339–364 (1980)
Baevski, A., Schneider, S., Auli, M.: vq-wav2vec: self-supervised learning of discrete speech representations. In: International Conference on Learning Representations (ICLR). Virtual only (2020)
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. Preprint arXiv:1308.3432 (2013)
Borsos, Z., et al.: AudioLM: a language modeling approach to audio generation. Preprint arXiv:2209.03143 (2022)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–393 (1999)
Choi, Y., Uh, Y., Yoo, J., Ha, J.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8185–8194. Virtual only (2020)
Constantinescu, A.O., O’6Reilly, J.X., Behrens, T.E.: Organizing conceptual knowledge in humans with a gridlike code. Science 352(6292), 1464–1468 (2016)
Cottrell, M., Olteanu, M., Rossi, F., Villa-Vialaneix, N.: Self-organizing maps, theory and applications. Revista de Investigacion Operacional 39(1), 1–22 (2018)
Csordás, R., Irie, K., Schmidhuber, J.: CTL++: Evaluating generalization on never-seen compositional patterns of known functions, and compatibility of neural representations. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, UAE (2022)
De Bodt, E., Cottrell, M., Letremy, P., Verleysen, M.: On the use of self-organizing maps to accelerate vector quantization. Neurocomputing 56, 187–203 (2004)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, USA, pp. 248–255 (2009)
Dhariwal, P., Jun, H., Payne, C., Kim, J.W., Radford, A., Sutskever, I.: Jukebox: a generative model for music. Preprint arXiv:2005.00341 (2020)
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12873–12883. Virtual only (2021)
Fortuin, V., Hüser, M., Locatello, F., Strathmann, H., Rätsch, G.: SOM-VAE: interpretable discrete representation learning on time series. In: International Conference on Learning Representations (ICLR), New Orleans, LA, USA (May 2019)
Fritzke, B.: A growing neural gas network learns topologies. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, pp. 625–632 (1994)
Hebb, D.O.: The organization of behavior; a neuropsycholocigal theory. Wiley Book Clin. Psychol. 62, 78 (1949)
Hinton, G.: Neural networks for machine learning. Coursera, video lectures (2012)
Hu, W., Miyato, T., Tokui, S., Matsumoto, E., Sugiyama, M.: Learning discrete representations via information maximizing self-augmented training. In: Proceedings of International Conference on Machine Learning (ICML), Sydney, Australia, pp. 1558–1567 (2017)
Hupkes, D., Singh, A., Korrel, K., Kruszewski, G., Bruni, E.: Learning compositionally through attentive guidance. In: Proceedings of International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France (2019)
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. In: International Conference on Learning Representations (ICLR), Toulon, France (2017)
Kaiser, L., Bengio, S., Roy, A., Vaswani, A., Parmar, N., Uszkoreit, J., Shazeer, N.: Fast decoding in sequence models using discrete latent variables. In: Proceedings of International Conference on Machine Learning (ICML), Stockholm, Sweden, pp. 2395–2404 (2018)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Internatinal Conference on Learning Representations (ICLR), Vancouver, Canada (2018)
Keller, T.A., Welling, M.: Topographic vaes learn equivariant capsules. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 28585–28597. Virtual only (2021)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR), Banff, Canada (2014)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)
Kohonen, T.: Comparison of SOM point densities based on different criteria. Neural Comput. 11(8), 2081–2095 (1999)
Kohonen, T.: Self-organizing Maps. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56927-2
Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, Computer Science Department, University of Toronto (2009)
Lee, D., Kim, C., Kim, S., Cho, M., Han, W.S.: Autoregressive image generation using residual quantization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 11523–11532 (2022)
Liska, A., Kruszewski, G., Baroni, M.: Memorize or generalize? searching for a compositional RNN in a haystack. In: AEGAP Workshop ICML, Stockholm, Sweden (2018)
Liu, D., Niehues, J.: Learning an artificial language for knowledge-sharing in multilingual translation. In: Proceedings of Conference on Machine Translation (WMT), Abu Dhabi, pp. 188–202 (2022)
Liu, D., et al.: Adaptive discrete communication bottlenecks with dynamic vector quantization. Preprint arXiv:2202.01334 (2022)
Liu, D., et al.: Discrete-valued neural communication. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 2109–2121. Virtual only (2021)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
MacQueen, J.: Classification and analysis of multivariate observations. In: Proceedings of Berkeley Symposium Mathematics Statistics Probability, pp. 281–297 (1967)
von der Malsburg, C.: Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14(2), 85–100 (1973)
von der Malsburg, C., Willshaw, D.J.: How to label nerve cells so that they can interconnect in an ordered fashion. Proc. Natl. Acad. Sci. 74(11), 5176–5178 (1977)
Manduchi, L., Hüser, M., Faltys, M., Vogt, J.E., Rätsch, G., Fortuin, V.: T-DPSOM: an interpretable clustering method for unsupervised learning of patient health states. In: Proceedings of Conference on Health, Inference, and Learning (CHIL), pp. 236–245. Virtual only (2021)
Martinetz, T., Schulten, K.: A “neural-gas” network learns topologies. In: Proceedings of International Conference on Artificial Neural Networks (ICANN), Espoo, Finland (1991)
Nasrabadi, N.M., Feng, Y.: Vector quantization of images based upon the Kohonen self-organizing feature maps. In: Proceedings of IEEE International Conference on Neural Networks (ICNN), vol. 1, pp. 101–105 (1988)
Oja, E.: Simplified neuron model as a principal component analyzer. J. Math. Biol. 15(3), 267–273 (1982)
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. In: Proc. Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, pp. 6306–6315 (2017)
Ozair, S., Li, Y., Razavi, A., Antonoglou, I., van den Oord, A., Vinyals, O.: Vector quantized models for planning. In: Proceedings of International Conference on Machine Learning (ICML), pp. 8302–8313. Virtual only (2021)
Ramesh, A., et al.: Zero-shot text-to-image generation. In: Proceedings of International Conference on Machine Learning (ICML), vol. 139, pp. 8821–8831. Virtual only (2021)
Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, pp. 14837–14847 (2019)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 10674–10685 (2022)
Roy, A., Vaswani, A., Parmar, N., Neelakantan, A.: Towards a better understanding of vector quantized autoencoders. OpenReview (2018)
Schlag, I., Irie, K., Schmidhuber, J.: Linear Transformers are secretly fast weight programmers. In: Proceedings of International Conference on Machine Learning (ICML). Virtual only (2021)
Schmidhuber, J.: Learning to control fast-weight memories: an alternative to recurrent nets. Technical Report. FKI-147-91, Institut für Informatik, Technische Universität München (March 1991)
Tirunagari, S., Bull, S., Kouchaki, S., Cooke, D., Poh, N.: Visualisation of survey responses using self-organising maps: a case study on diabetes self-care factors. In: Proceedings of IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–6 (2016)
Tjandra, A., Sakti, S., Nakamura, S.: Transformer VQ-VAE for unsupervised unit discovery and speech synthesis: zerospeech 2020 challenge. In: Proceedings of Interspeech, pp. 4851–4855. Virtual only (2020)
Träuble, F., et al.: Discrete key-value bottleneck. Preprint arXiv:2207.11240 (2022)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, pp. 5998–6008 (2017)
Walker, J., Razavi, A., Oord, A.V.d.: Predicting video with VQVAE. Preprint arXiv:2103.01950 (2021)
Willshaw, D.J., von der Malsburg, C.: How patterned neural connections can be set up by self-organization. Proc. Roy. Soc. Lond. Seri. B. Biol. Scie. 194(1117), 431–445 (1976)
Willshaw, D.J., von der Malsburg, C.: A marker induction mechanism for the establishment of ordered neural mappings: its application to the retinotectal problem. Phil. Trans. Roy. Soc. Lond. B, Biol. Sci. 287(1021), 203–243 (1979)
Yan, W., Zhang, Y., Abbeel, P., Srinivas, A.: VideoGPT: video generation using vq-vae and transformers. Preprint arXiv:2104.10157 (2021)
Yin, H.: The self-organizing maps: background, theories, extensions and applications. In: Computational Intelligence: A Compendium, pp. 715–762 (2008)
Yu, J., et al.: Vector-quantized image modeling with improved VQGAN. In: International Conference on Learning Representations (ICLR). Virtual only (2022)
Yu, J., et al.: Scaling autoregressive models for content-rich text-to-image generation. Preprint arXiv:2206.10789 (2022)
Zeghidour, N., Luebs, A., Omran, A., Skoglund, J., Tagliasacchi, M.: Soundstream: an end-to-end neural audio codec. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 495–507 (2021)
Acknowledgments
This research was partially funded by ERC Advanced grant no: 742870, project AlgoRNN, and by Swiss National Science Foundation grant no: 200021_192356, project NEUSYM. We are thankful for hardware donations from NVIDIA and IBM. The resources used for this work were partially provided by Swiss National Supercomputing Centre (CSCS) project s1145 and s1154.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Irie, K., Csordás, R., Schmidhuber, J. (2024). Self-organising Neural Discrete Representation Learning à la Kohonen. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15016. Springer, Cham. https://doi.org/10.1007/978-3-031-72332-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-72332-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72331-5
Online ISBN: 978-3-031-72332-2
eBook Packages: Computer ScienceComputer Science (R0)