Skip to main content

Self-organising Neural Discrete Representation Learning à la Kohonen

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2024 (ICANN 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15016))

Included in the following conference series:

  • 678 Accesses

Abstract

Unsupervised learning of discrete representations in neural networks (NNs) from continuous ones is essential for many modern applications. Vector Quantisation (VQ) has become popular for this, in particular in the context of generative models, such as Variational Auto-Encoders (VAEs), where the exponential moving average-based VQ (EMA-VQ) algorithm is often used. Here, we study an alternative VQ algorithm based on Kohonen’s learning rule for the Self-Organising Map (KSOM; 1982). EMA-VQ is a special case of KSOM. KSOM is known to offer two potential benefits: empirically, it converges faster than EMA-VQ, and KSOM-generated discrete representations form a topological structure on the grid whose nodes are the discrete symbols, resulting in an artificial version of the brain’s topographic map. We revisit these properties by using KSOM in VQ-VAEs for image processing. In our experiments, the speed-up compared to well-configured EMA-VQ is only observable at the beginning of training, but KSOM is generally much more robust, e.g., w.r.t. the choice of initialisation schemes (Our code is public: https://github.com/IDSIA/kohonen-vae. The full version with an appendix can be found at: https://arxiv.org/abs/2302.07950).

K. Irie and R. Csordás—Equal contribution. Work done at IDSIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Here we use T as both the number of inputs and iterations.

  2. 2.

    Further experimental details are available through the link on page 1.

  3. 3.

    We also observed that KSOM generally improves the codebook utilisation. Further illustrations of this is available through the link provided on page 1.

  4. 4.

    Empirical results comparing KSOM and SOM-VAE is provided through the link on page 1. We find that the final reconstruction loss is similar for the EMA baseline, SOM-VAE, and KSOM, but KSOM converges the fastest (significantly faster than SOM-VAE). We also confirm that the EMA-based codebook learning outperforms the gradient-based one as noted by the original authors of VQ-VAE. We focus on KSOM because it is the natural extension of EMA which is recommended over the gradient-based variant (SOM-VAE is a natural extension of the gradient-based variant).

References

  1. Agustsson, E., et al.: Soft-to-hard vector quantization for end-to-end learning compressible representations. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1141–1151 (2017)

    Google Scholar 

  2. Amari, S.I.: Topographic organization of nerve fields. Bull. Math. Biol. 42(3), 339–364 (1980)

    Article  MathSciNet  Google Scholar 

  3. Baevski, A., Schneider, S., Auli, M.: vq-wav2vec: self-supervised learning of discrete speech representations. In: International Conference on Learning Representations (ICLR). Virtual only (2020)

    Google Scholar 

  4. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. Preprint arXiv:1308.3432 (2013)

  5. Borsos, Z., et al.: AudioLM: a language modeling approach to audio generation. Preprint arXiv:2209.03143 (2022)

  6. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–393 (1999)

    Article  Google Scholar 

  7. Choi, Y., Uh, Y., Yoo, J., Ha, J.: StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8185–8194. Virtual only (2020)

    Google Scholar 

  8. Constantinescu, A.O., O’6Reilly, J.X., Behrens, T.E.: Organizing conceptual knowledge in humans with a gridlike code. Science 352(6292), 1464–1468 (2016)

    Google Scholar 

  9. Cottrell, M., Olteanu, M., Rossi, F., Villa-Vialaneix, N.: Self-organizing maps, theory and applications. Revista de Investigacion Operacional 39(1), 1–22 (2018)

    MathSciNet  Google Scholar 

  10. Csordás, R., Irie, K., Schmidhuber, J.: CTL++: Evaluating generalization on never-seen compositional patterns of known functions, and compatibility of neural representations. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, UAE (2022)

    Google Scholar 

  11. De Bodt, E., Cottrell, M., Letremy, P., Verleysen, M.: On the use of self-organizing maps to accelerate vector quantization. Neurocomputing 56, 187–203 (2004)

    Article  Google Scholar 

  12. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, USA, pp. 248–255 (2009)

    Google Scholar 

  13. Dhariwal, P., Jun, H., Payne, C., Kim, J.W., Radford, A., Sutskever, I.: Jukebox: a generative model for music. Preprint arXiv:2005.00341 (2020)

  14. Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12873–12883. Virtual only (2021)

    Google Scholar 

  15. Fortuin, V., Hüser, M., Locatello, F., Strathmann, H., Rätsch, G.: SOM-VAE: interpretable discrete representation learning on time series. In: International Conference on Learning Representations (ICLR), New Orleans, LA, USA (May 2019)

    Google Scholar 

  16. Fritzke, B.: A growing neural gas network learns topologies. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, pp. 625–632 (1994)

    Google Scholar 

  17. Hebb, D.O.: The organization of behavior; a neuropsycholocigal theory. Wiley Book Clin. Psychol. 62, 78 (1949)

    Google Scholar 

  18. Hinton, G.: Neural networks for machine learning. Coursera, video lectures (2012)

    Google Scholar 

  19. Hu, W., Miyato, T., Tokui, S., Matsumoto, E., Sugiyama, M.: Learning discrete representations via information maximizing self-augmented training. In: Proceedings of International Conference on Machine Learning (ICML), Sydney, Australia, pp. 1558–1567 (2017)

    Google Scholar 

  20. Hupkes, D., Singh, A., Korrel, K., Kruszewski, G., Bruni, E.: Learning compositionally through attentive guidance. In: Proceedings of International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France (2019)

    Google Scholar 

  21. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. In: International Conference on Learning Representations (ICLR), Toulon, France (2017)

    Google Scholar 

  22. Kaiser, L., Bengio, S., Roy, A., Vaswani, A., Parmar, N., Uszkoreit, J., Shazeer, N.: Fast decoding in sequence models using discrete latent variables. In: Proceedings of International Conference on Machine Learning (ICML), Stockholm, Sweden, pp. 2395–2404 (2018)

    Google Scholar 

  23. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Internatinal Conference on Learning Representations (ICLR), Vancouver, Canada (2018)

    Google Scholar 

  24. Keller, T.A., Welling, M.: Topographic vaes learn equivariant capsules. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 28585–28597. Virtual only (2021)

    Google Scholar 

  25. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR), Banff, Canada (2014)

    Google Scholar 

  26. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)

    Article  Google Scholar 

  27. Kohonen, T.: Comparison of SOM point densities based on different criteria. Neural Comput. 11(8), 2081–2095 (1999)

    Article  Google Scholar 

  28. Kohonen, T.: Self-organizing Maps. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56927-2

    Book  Google Scholar 

  29. Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, Computer Science Department, University of Toronto (2009)

    Google Scholar 

  30. Lee, D., Kim, C., Kim, S., Cho, M., Han, W.S.: Autoregressive image generation using residual quantization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 11523–11532 (2022)

    Google Scholar 

  31. Liska, A., Kruszewski, G., Baroni, M.: Memorize or generalize? searching for a compositional RNN in a haystack. In: AEGAP Workshop ICML, Stockholm, Sweden (2018)

    Google Scholar 

  32. Liu, D., Niehues, J.: Learning an artificial language for knowledge-sharing in multilingual translation. In: Proceedings of Conference on Machine Translation (WMT), Abu Dhabi, pp. 188–202 (2022)

    Google Scholar 

  33. Liu, D., et al.: Adaptive discrete communication bottlenecks with dynamic vector quantization. Preprint arXiv:2202.01334 (2022)

  34. Liu, D., et al.: Discrete-valued neural communication. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), pp. 2109–2121. Virtual only (2021)

    Google Scholar 

  35. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  36. MacQueen, J.: Classification and analysis of multivariate observations. In: Proceedings of Berkeley Symposium Mathematics Statistics Probability, pp. 281–297 (1967)

    Google Scholar 

  37. von der Malsburg, C.: Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14(2), 85–100 (1973)

    Article  Google Scholar 

  38. von der Malsburg, C., Willshaw, D.J.: How to label nerve cells so that they can interconnect in an ordered fashion. Proc. Natl. Acad. Sci. 74(11), 5176–5178 (1977)

    Article  Google Scholar 

  39. Manduchi, L., Hüser, M., Faltys, M., Vogt, J.E., Rätsch, G., Fortuin, V.: T-DPSOM: an interpretable clustering method for unsupervised learning of patient health states. In: Proceedings of Conference on Health, Inference, and Learning (CHIL), pp. 236–245. Virtual only (2021)

    Google Scholar 

  40. Martinetz, T., Schulten, K.: A “neural-gas” network learns topologies. In: Proceedings of International Conference on Artificial Neural Networks (ICANN), Espoo, Finland (1991)

    Google Scholar 

  41. Nasrabadi, N.M., Feng, Y.: Vector quantization of images based upon the Kohonen self-organizing feature maps. In: Proceedings of IEEE International Conference on Neural Networks (ICNN), vol. 1, pp. 101–105 (1988)

    Google Scholar 

  42. Oja, E.: Simplified neuron model as a principal component analyzer. J. Math. Biol. 15(3), 267–273 (1982)

    Article  MathSciNet  Google Scholar 

  43. van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. In: Proc. Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, pp. 6306–6315 (2017)

    Google Scholar 

  44. Ozair, S., Li, Y., Razavi, A., Antonoglou, I., van den Oord, A., Vinyals, O.: Vector quantized models for planning. In: Proceedings of International Conference on Machine Learning (ICML), pp. 8302–8313. Virtual only (2021)

    Google Scholar 

  45. Ramesh, A., et al.: Zero-shot text-to-image generation. In: Proceedings of International Conference on Machine Learning (ICML), vol. 139, pp. 8821–8831. Virtual only (2021)

    Google Scholar 

  46. Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, pp. 14837–14847 (2019)

    Google Scholar 

  47. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 10674–10685 (2022)

    Google Scholar 

  48. Roy, A., Vaswani, A., Parmar, N., Neelakantan, A.: Towards a better understanding of vector quantized autoencoders. OpenReview (2018)

    Google Scholar 

  49. Schlag, I., Irie, K., Schmidhuber, J.: Linear Transformers are secretly fast weight programmers. In: Proceedings of International Conference on Machine Learning (ICML). Virtual only (2021)

    Google Scholar 

  50. Schmidhuber, J.: Learning to control fast-weight memories: an alternative to recurrent nets. Technical Report. FKI-147-91, Institut für Informatik, Technische Universität München (March 1991)

    Google Scholar 

  51. Tirunagari, S., Bull, S., Kouchaki, S., Cooke, D., Poh, N.: Visualisation of survey responses using self-organising maps: a case study on diabetes self-care factors. In: Proceedings of IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–6 (2016)

    Google Scholar 

  52. Tjandra, A., Sakti, S., Nakamura, S.: Transformer VQ-VAE for unsupervised unit discovery and speech synthesis: zerospeech 2020 challenge. In: Proceedings of Interspeech, pp. 4851–4855. Virtual only (2020)

    Google Scholar 

  53. Träuble, F., et al.: Discrete key-value bottleneck. Preprint arXiv:2207.11240 (2022)

  54. Vaswani, A., et al.: Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, pp. 5998–6008 (2017)

    Google Scholar 

  55. Walker, J., Razavi, A., Oord, A.V.d.: Predicting video with VQVAE. Preprint arXiv:2103.01950 (2021)

  56. Willshaw, D.J., von der Malsburg, C.: How patterned neural connections can be set up by self-organization. Proc. Roy. Soc. Lond. Seri. B. Biol. Scie. 194(1117), 431–445 (1976)

    Google Scholar 

  57. Willshaw, D.J., von der Malsburg, C.: A marker induction mechanism for the establishment of ordered neural mappings: its application to the retinotectal problem. Phil. Trans. Roy. Soc. Lond. B, Biol. Sci. 287(1021), 203–243 (1979)

    Google Scholar 

  58. Yan, W., Zhang, Y., Abbeel, P., Srinivas, A.: VideoGPT: video generation using vq-vae and transformers. Preprint arXiv:2104.10157 (2021)

  59. Yin, H.: The self-organizing maps: background, theories, extensions and applications. In: Computational Intelligence: A Compendium, pp. 715–762 (2008)

    Google Scholar 

  60. Yu, J., et al.: Vector-quantized image modeling with improved VQGAN. In: International Conference on Learning Representations (ICLR). Virtual only (2022)

    Google Scholar 

  61. Yu, J., et al.: Scaling autoregressive models for content-rich text-to-image generation. Preprint arXiv:2206.10789 (2022)

  62. Zeghidour, N., Luebs, A., Omran, A., Skoglund, J., Tagliasacchi, M.: Soundstream: an end-to-end neural audio codec. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 495–507 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

This research was partially funded by ERC Advanced grant no: 742870, project AlgoRNN, and by Swiss National Science Foundation grant no: 200021_192356, project NEUSYM. We are thankful for hardware donations from NVIDIA and IBM. The resources used for this work were partially provided by Swiss National Supercomputing Centre (CSCS) project s1145 and s1154.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kazuki Irie or Róbert Csordás .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Irie, K., Csordás, R., Schmidhuber, J. (2024). Self-organising Neural Discrete Representation Learning à la Kohonen. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15016. Springer, Cham. https://doi.org/10.1007/978-3-031-72332-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72332-2_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72331-5

  • Online ISBN: 978-3-031-72332-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics