Self-organising Neural Discrete Representation Learning à la Kohonen

Irie, Kazuki; Csordás, Róbert; Schmidhuber, Jürgen

doi:10.1007/978-3-031-72332-2_23

Kazuki Irie¹¹,
Róbert Csordás¹² &
Jürgen Schmidhuber^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15016))

Included in the following conference series:

International Conference on Artificial Neural Networks

678 Accesses

Abstract

Unsupervised learning of discrete representations in neural networks (NNs) from continuous ones is essential for many modern applications. Vector Quantisation (VQ) has become popular for this, in particular in the context of generative models, such as Variational Auto-Encoders (VAEs), where the exponential moving average-based VQ (EMA-VQ) algorithm is often used. Here, we study an alternative VQ algorithm based on Kohonen’s learning rule for the Self-Organising Map (KSOM; 1982). EMA-VQ is a special case of KSOM. KSOM is known to offer two potential benefits: empirically, it converges faster than EMA-VQ, and KSOM-generated discrete representations form a topological structure on the grid whose nodes are the discrete symbols, resulting in an artificial version of the brain’s topographic map. We revisit these properties by using KSOM in VQ-VAEs for image processing. In our experiments, the speed-up compared to well-configured EMA-VQ is only observable at the beginning of training, but KSOM is generally much more robust, e.g., w.r.t. the choice of initialisation schemes (Our code is public: https://github.com/IDSIA/kohonen-vae. The full version with an appendix can be found at: https://arxiv.org/abs/2302.07950).

K. Irie and R. Csordás—Equal contribution. Work done at IDSIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Here we use T as both the number of inputs and iterations.
2.
Further experimental details are available through the link on page 1.
3.
We also observed that KSOM generally improves the codebook utilisation. Further illustrations of this is available through the link provided on page 1.
4.
Empirical results comparing KSOM and SOM-VAE is provided through the link on page 1. We find that the final reconstruction loss is similar for the EMA baseline, SOM-VAE, and KSOM, but KSOM converges the fastest (significantly faster than SOM-VAE). We also confirm that the EMA-based codebook learning outperforms the gradient-based one as noted by the original authors of VQ-VAE. We focus on KSOM because it is the natural extension of EMA which is recommended over the gradient-based variant (SOM-VAE is a natural extension of the gradient-based variant).

References

Agustsson, E., et al.: Soft-to-hard vector quantization for end-to-end learning compressible representations. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1141–1151 (2017)
Google Scholar
Amari, S.I.: Topographic organization of nerve fields. Bull. Math. Biol. 42(3), 339–364 (1980)
Article MathSciNet Google Scholar
Baevski, A., Schneider, S., Auli, M.: vq-wav2vec: self-supervised learning of discrete speech representations. In: International Conference on Learning Representations (ICLR). Virtual only (2020)
Google Scholar
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. Preprint arXiv:1308.3432 (2013)
Borsos, Z., et al.: AudioLM: a language modeling approach to audio generation. Preprint arXiv:2209.03143 (2022)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–393 (1999)
Article Google Scholar
Choi, Y., Uh, Y., Yoo, J., Ha, J.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8185–8194. Virtual only (2020)
Google Scholar
Constantinescu, A.O., O’6Reilly, J.X., Behrens, T.E.: Organizing conceptual knowledge in humans with a gridlike code. Science 352(6292), 1464–1468 (2016)
Google Scholar
Cottrell, M., Olteanu, M., Rossi, F., Villa-Vialaneix, N.: Self-organizing maps, theory and applications. Revista de Investigacion Operacional 39(1), 1–22 (2018)
MathSciNet Google Scholar
Csordás, R., Irie, K., Schmidhuber, J.: CTL++: Evaluating generalization on never-seen compositional patterns of known functions, and compatibility of neural representations. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, UAE (2022)
Google Scholar
De Bodt, E., Cottrell, M., Letremy, P., Verleysen, M.: On the use of self-organizing maps to accelerate vector quantization. Neurocomputing 56, 187–203 (2004)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, USA, pp. 248–255 (2009)
Google Scholar
Dhariwal, P., Jun, H., Payne, C., Kim, J.W., Radford, A., Sutskever, I.: Jukebox: a generative model for music. Preprint arXiv:2005.00341 (2020)
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12873–12883. Virtual only (2021)
Google Scholar
Fortuin, V., Hüser, M., Locatello, F., Strathmann, H., Rätsch, G.: SOM-VAE: interpretable discrete representation learning on time series. In: International Conference on Learning Representations (ICLR), New Orleans, LA, USA (May 2019)
Google Scholar
Fritzke, B.: A growing neural gas network learns topologies. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, pp. 625–632 (1994)
Google Scholar
Hebb, D.O.: The organization of behavior; a neuropsycholocigal theory. Wiley Book Clin. Psychol. 62, 78 (1949)
Google Scholar
Hinton, G.: Neural networks for machine learning. Coursera, video lectures (2012)
Google Scholar
Hu, W., Miyato, T., Tokui, S., Matsumoto, E., Sugiyama, M.: Learning discrete representations via information maximizing self-augmented training. In: Proceedings of International Conference on Machine Learning (ICML), Sydney, Australia, pp. 1558–1567 (2017)
Google Scholar
Hupkes, D., Singh, A., Korrel, K., Kruszewski, G., Bruni, E.: Learning compositionally through attentive guidance. In: Proceedings of International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France (2019)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. In: International Conference on Learning Representations (ICLR), Toulon, France (2017)
Google Scholar
Kaiser, L., Bengio, S., Roy, A., Vaswani, A., Parmar, N., Uszkoreit, J., Shazeer, N.: Fast decoding in sequence models using discrete latent variables. In: Proceedings of International Conference on Machine Learning (ICML), Stockholm, Sweden, pp. 2395–2404 (2018)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Internatinal Conference on Learning Representations (ICLR), Vancouver, Canada (2018)
Google Scholar
Keller, T.A., Welling, M.: Topographic vaes learn equivariant capsules. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 28585–28597. Virtual only (2021)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR), Banff, Canada (2014)
Google Scholar
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)
Article Google Scholar
Kohonen, T.: Comparison of SOM point densities based on different criteria. Neural Comput. 11(8), 2081–2095 (1999)
Article Google Scholar
Kohonen, T.: Self-organizing Maps. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56927-2
Book Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, Computer Science Department, University of Toronto (2009)
Google Scholar
Lee, D., Kim, C., Kim, S., Cho, M., Han, W.S.: Autoregressive image generation using residual quantization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 11523–11532 (2022)
Google Scholar
Liska, A., Kruszewski, G., Baroni, M.: Memorize or generalize? searching for a compositional RNN in a haystack. In: AEGAP Workshop ICML, Stockholm, Sweden (2018)
Google Scholar
Liu, D., Niehues, J.: Learning an artificial language for knowledge-sharing in multilingual translation. In: Proceedings of Conference on Machine Translation (WMT), Abu Dhabi, pp. 188–202 (2022)
Google Scholar
Liu, D., et al.: Adaptive discrete communication bottlenecks with dynamic vector quantization. Preprint arXiv:2202.01334 (2022)
Liu, D., et al.: Discrete-valued neural communication. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 2109–2121. Virtual only (2021)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
MacQueen, J.: Classification and analysis of multivariate observations. In: Proceedings of Berkeley Symposium Mathematics Statistics Probability, pp. 281–297 (1967)
Google Scholar
von der Malsburg, C.: Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14(2), 85–100 (1973)
Article Google Scholar
von der Malsburg, C., Willshaw, D.J.: How to label nerve cells so that they can interconnect in an ordered fashion. Proc. Natl. Acad. Sci. 74(11), 5176–5178 (1977)
Article Google Scholar
Manduchi, L., Hüser, M., Faltys, M., Vogt, J.E., Rätsch, G., Fortuin, V.: T-DPSOM: an interpretable clustering method for unsupervised learning of patient health states. In: Proceedings of Conference on Health, Inference, and Learning (CHIL), pp. 236–245. Virtual only (2021)
Google Scholar
Martinetz, T., Schulten, K.: A “neural-gas” network learns topologies. In: Proceedings of International Conference on Artificial Neural Networks (ICANN), Espoo, Finland (1991)
Google Scholar
Nasrabadi, N.M., Feng, Y.: Vector quantization of images based upon the Kohonen self-organizing feature maps. In: Proceedings of IEEE International Conference on Neural Networks (ICNN), vol. 1, pp. 101–105 (1988)
Google Scholar
Oja, E.: Simplified neuron model as a principal component analyzer. J. Math. Biol. 15(3), 267–273 (1982)
Article MathSciNet Google Scholar
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. In: Proc. Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, pp. 6306–6315 (2017)
Google Scholar
Ozair, S., Li, Y., Razavi, A., Antonoglou, I., van den Oord, A., Vinyals, O.: Vector quantized models for planning. In: Proceedings of International Conference on Machine Learning (ICML), pp. 8302–8313. Virtual only (2021)
Google Scholar
Ramesh, A., et al.: Zero-shot text-to-image generation. In: Proceedings of International Conference on Machine Learning (ICML), vol. 139, pp. 8821–8831. Virtual only (2021)
Google Scholar
Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, pp. 14837–14847 (2019)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 10674–10685 (2022)
Google Scholar
Roy, A., Vaswani, A., Parmar, N., Neelakantan, A.: Towards a better understanding of vector quantized autoencoders. OpenReview (2018)
Google Scholar
Schlag, I., Irie, K., Schmidhuber, J.: Linear Transformers are secretly fast weight programmers. In: Proceedings of International Conference on Machine Learning (ICML). Virtual only (2021)
Google Scholar
Schmidhuber, J.: Learning to control fast-weight memories: an alternative to recurrent nets. Technical Report. FKI-147-91, Institut für Informatik, Technische Universität München (March 1991)
Google Scholar
Tirunagari, S., Bull, S., Kouchaki, S., Cooke, D., Poh, N.: Visualisation of survey responses using self-organising maps: a case study on diabetes self-care factors. In: Proceedings of IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–6 (2016)
Google Scholar
Tjandra, A., Sakti, S., Nakamura, S.: Transformer VQ-VAE for unsupervised unit discovery and speech synthesis: zerospeech 2020 challenge. In: Proceedings of Interspeech, pp. 4851–4855. Virtual only (2020)
Google Scholar
Träuble, F., et al.: Discrete key-value bottleneck. Preprint arXiv:2207.11240 (2022)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, pp. 5998–6008 (2017)
Google Scholar
Walker, J., Razavi, A., Oord, A.V.d.: Predicting video with VQVAE. Preprint arXiv:2103.01950 (2021)
Willshaw, D.J., von der Malsburg, C.: How patterned neural connections can be set up by self-organization. Proc. Roy. Soc. Lond. Seri. B. Biol. Scie. 194(1117), 431–445 (1976)
Google Scholar
Willshaw, D.J., von der Malsburg, C.: A marker induction mechanism for the establishment of ordered neural mappings: its application to the retinotectal problem. Phil. Trans. Roy. Soc. Lond. B, Biol. Sci. 287(1021), 203–243 (1979)
Google Scholar
Yan, W., Zhang, Y., Abbeel, P., Srinivas, A.: VideoGPT: video generation using vq-vae and transformers. Preprint arXiv:2104.10157 (2021)
Yin, H.: The self-organizing maps: background, theories, extensions and applications. In: Computational Intelligence: A Compendium, pp. 715–762 (2008)
Google Scholar
Yu, J., et al.: Vector-quantized image modeling with improved VQGAN. In: International Conference on Learning Representations (ICLR). Virtual only (2022)
Google Scholar
Yu, J., et al.: Scaling autoregressive models for content-rich text-to-image generation. Preprint arXiv:2206.10789 (2022)
Zeghidour, N., Luebs, A., Omran, A., Skoglund, J., Tagliasacchi, M.: Soundstream: an end-to-end neural audio codec. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 495–507 (2021)
Article Google Scholar

Download references

Acknowledgments

This research was partially funded by ERC Advanced grant no: 742870, project AlgoRNN, and by Swiss National Science Foundation grant no: 200021_192356, project NEUSYM. We are thankful for hardware donations from NVIDIA and IBM. The resources used for this work were partially provided by Swiss National Supercomputing Centre (CSCS) project s1145 and s1154.

Author information

Authors and Affiliations

Center for Brain Science, Harvard University, Cambridge, MA, USA
Kazuki Irie
Stanford University, Stanford, CA, USA
Róbert Csordás
The Swiss AI Lab, IDSIA, USI & SUPSI, Lugano, Switzerland
Jürgen Schmidhuber
AI Initiative, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Jürgen Schmidhuber

Authors

Kazuki Irie
View author publications
You can also search for this author in PubMed Google Scholar
Róbert Csordás
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kazuki Irie or Róbert Csordás .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Irie, K., Csordás, R., Schmidhuber, J. (2024). Self-organising Neural Discrete Representation Learning à la Kohonen. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15016. Springer, Cham. https://doi.org/10.1007/978-3-031-72332-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-72332-2_23
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72331-5
Online ISBN: 978-3-031-72332-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics