Skip to main content
Log in

Visualization and clustering of categorical data with probabilistic self-organizing map

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Andreopoulos B, An A, Wang X (2006) Bi-level clustering of mixed categorical and numerical biomedical data. Int J Data Min Bioinform 1(1):19–56

    Google Scholar 

  2. Anouar F, Badran F, Thiria S (1998) Probabilistic self-organizing map and radial basis function networks. Neurocomputing 20(1–3):83–96

    Article  MATH  Google Scholar 

  3. Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/∼mlearn/MLRepository.html

  4. Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal Machine Intell 20:281–293

    Article  Google Scholar 

  5. Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1)

  6. Celeux G, Govaert G (1992) A classification EM algorithm for clustering and stochastic version. Comput Stat Data Anal 14:351–332

    Article  MathSciNet  Google Scholar 

  7. Celeux G, Forbes F, Payrard N (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36:131–144

    Article  MATH  Google Scholar 

  8. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  9. Dolinicar S, Weingessel A, Buchta C, Dimitriadou E (1998) A Comparaison of several cluster algorithms on artificial binary data, scenarios from travel market segmentation. Working paper series 19, SFB (adaptive information systems and modelling in economics and management science)

  10. Fritzke B (1997) Some competitive learning methods. http://www.neuroinformatik.ruhr-uni-bochum. de/VDM/research/gsn/DemoGNG/GNG.htm.

  11. Girolami M (2001) The topographic organisation and visualisation of binary data using multivariate-Bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374

    Article  Google Scholar 

  12. Graepel T, Burger M, Obermayer K (1998) Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21:173–190

    Article  MATH  Google Scholar 

  13. Heskes T (2001) Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans Neural Netw 12:1299–1305

    Article  Google Scholar 

  14. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Machine Learn 42:177–196

    Article  MATH  Google Scholar 

  15. Hsu C-C, Wang K-M, Wang S-H (2006) Gvisom for multivariate mixed data projection and structure visualization. In: International joint conferences on neural networks. IJCNN 16–21 July 2006

  16. Ibbou S, Cottrell M (1995) Multiple correspondance analysis crosstabulation matrix using the Kohonen algorithm. In: Verlaeysen M (ed) Proceedings of ESANN’95, pp 27–32. Dfacto Bruxelles

  17. Jollois F, Nadif M (2007) Speed-up for the expectation-maximization algorithm for clustering categorical data. J Glob Optim 37(4):513–525

    Article  MATH  MathSciNet  Google Scholar 

  18. Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In: Dove DL, Korb KB, Oliver JJ (eds) Information, statistics and induction in science, ISIS 96, Australia. MIT Press, Cambridge, pp 375–384

  19. Kaban A, Girolami M (2001) A combined latent class and trait model for the analysis and visualization of discrete data. IEEE Trans Pattern Anal Mach Intell 23:859–872

    Article  Google Scholar 

  20. Kaski S, Honkela T, Lagus K, Kohonen T (1998) WEBSOM–self-organizing maps of document collections. Neurocomputing 21:101–117

    Article  MATH  Google Scholar 

  21. Kohonen T (2001) Self-organizing maps. Springer, Berlin.

    MATH  Google Scholar 

  22. Kohonen T, Kaski S, Lappalainen H (1997) Self-organized formation of various invariant-feature filters in the adaptive subspace SOM. Neural Comput 9(6):1321–1344

    Article  Google Scholar 

  23. Kostiainen T, Lampinen J (2002) On the generative probability density model in the self-organizing map. Neurocomputing 48:217-228

    Article  MATH  Google Scholar 

  24. Lebart L, Piron M, Steiner J-F (2003) La sémiométrie. Dunod, Paris.

    Google Scholar 

  25. Lebbah M, Thiria S, Badran F (2000) Topological map for binary data, topological map for binary data, ESANN, Bruges, April 26-27-28, (2000), Proceedings.

  26. Lebbah M, Chazottes A, Badran F, Thiria S (2005) Mixed topological map. In: ESANN, pp 357–362

  27. Lebbah M, Rogovschi N, Bennani Y (2007) BeSOM: bernoulli on self organizing map. In: International joint conferences on neural networks. IJCNN 2007, 12–17 August. Orlando, Florida, pp 631–636

  28. Leich F, Weingessel A, Dimitriadou E (1998) E.: Competitive learning for binary data. In: Proceedings of ICANN’98, 2–4 september. Springer, Heidelberg

  29. Luttrel SP (1994) A bayesian analysis of self-organizing maps. Neural Comput 6

  30. Martinetz T, Schulten K (1991) A “neural-gas” network learns topologies. Artif Neural Netw I:397–402

    Google Scholar 

  31. McLachlan G, Krishman T (1997) The EM algorithm and extensions. Wiley, New York

    MATH  Google Scholar 

  32. Nadif M, Govaert G (1998) Clustering for binary data and mixture models: choice of the model. Appl Stoch Models Data Anal 13:269–278

    Article  Google Scholar 

  33. Saund E (1995) A multiple cause mixture model for unsupervised learning. Neural Comput 7:51–71

    Article  Google Scholar 

  34. Steiner J-F, Auliard O (1992) La sémiometrie: un outil de validation des réponses. In: Lebart L (ed) La Qualité de l’Information dans les Enquêtes. Quality of information in sample surveys. ASU, Dunod, Paris, pp 241–274

  35. Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, pp 592–598. ISBN: 0-262-11245-0

  36. Verbeek JJ, Vlassis N, Kröse BJA (2005) Self-organizing mixture models. Neurocomputing 63:99–123

    Article  Google Scholar 

  37. Yin J, Tan Z (2005) Clustering mixed type attributes in large dataset. In: ISPA, pp 655–661

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khalid Benabdeslem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lebbah, M., Benabdeslem, K. Visualization and clustering of categorical data with probabilistic self-organizing map. Neural Comput & Applic 19, 393–404 (2010). https://doi.org/10.1007/s00521-009-0299-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-009-0299-2

Keywords

Navigation