Abstract
This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data.









Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Andreopoulos B, An A, Wang X (2006) Bi-level clustering of mixed categorical and numerical biomedical data. Int J Data Min Bioinform 1(1):19–56
Anouar F, Badran F, Thiria S (1998) Probabilistic self-organizing map and radial basis function networks. Neurocomputing 20(1–3):83–96
Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/∼mlearn/MLRepository.html
Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal Machine Intell 20:281–293
Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1)
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and stochastic version. Comput Stat Data Anal 14:351–332
Celeux G, Forbes F, Payrard N (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36:131–144
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38
Dolinicar S, Weingessel A, Buchta C, Dimitriadou E (1998) A Comparaison of several cluster algorithms on artificial binary data, scenarios from travel market segmentation. Working paper series 19, SFB (adaptive information systems and modelling in economics and management science)
Fritzke B (1997) Some competitive learning methods. http://www.neuroinformatik.ruhr-uni-bochum. de/VDM/research/gsn/DemoGNG/GNG.htm.
Girolami M (2001) The topographic organisation and visualisation of binary data using multivariate-Bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374
Graepel T, Burger M, Obermayer K (1998) Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21:173–190
Heskes T (2001) Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans Neural Netw 12:1299–1305
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Machine Learn 42:177–196
Hsu C-C, Wang K-M, Wang S-H (2006) Gvisom for multivariate mixed data projection and structure visualization. In: International joint conferences on neural networks. IJCNN 16–21 July 2006
Ibbou S, Cottrell M (1995) Multiple correspondance analysis crosstabulation matrix using the Kohonen algorithm. In: Verlaeysen M (ed) Proceedings of ESANN’95, pp 27–32. Dfacto Bruxelles
Jollois F, Nadif M (2007) Speed-up for the expectation-maximization algorithm for clustering categorical data. J Glob Optim 37(4):513–525
Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In: Dove DL, Korb KB, Oliver JJ (eds) Information, statistics and induction in science, ISIS 96, Australia. MIT Press, Cambridge, pp 375–384
Kaban A, Girolami M (2001) A combined latent class and trait model for the analysis and visualization of discrete data. IEEE Trans Pattern Anal Mach Intell 23:859–872
Kaski S, Honkela T, Lagus K, Kohonen T (1998) WEBSOM–self-organizing maps of document collections. Neurocomputing 21:101–117
Kohonen T (2001) Self-organizing maps. Springer, Berlin.
Kohonen T, Kaski S, Lappalainen H (1997) Self-organized formation of various invariant-feature filters in the adaptive subspace SOM. Neural Comput 9(6):1321–1344
Kostiainen T, Lampinen J (2002) On the generative probability density model in the self-organizing map. Neurocomputing 48:217-228
Lebart L, Piron M, Steiner J-F (2003) La sémiométrie. Dunod, Paris.
Lebbah M, Thiria S, Badran F (2000) Topological map for binary data, topological map for binary data, ESANN, Bruges, April 26-27-28, (2000), Proceedings.
Lebbah M, Chazottes A, Badran F, Thiria S (2005) Mixed topological map. In: ESANN, pp 357–362
Lebbah M, Rogovschi N, Bennani Y (2007) BeSOM: bernoulli on self organizing map. In: International joint conferences on neural networks. IJCNN 2007, 12–17 August. Orlando, Florida, pp 631–636
Leich F, Weingessel A, Dimitriadou E (1998) E.: Competitive learning for binary data. In: Proceedings of ICANN’98, 2–4 september. Springer, Heidelberg
Luttrel SP (1994) A bayesian analysis of self-organizing maps. Neural Comput 6
Martinetz T, Schulten K (1991) A “neural-gas” network learns topologies. Artif Neural Netw I:397–402
McLachlan G, Krishman T (1997) The EM algorithm and extensions. Wiley, New York
Nadif M, Govaert G (1998) Clustering for binary data and mixture models: choice of the model. Appl Stoch Models Data Anal 13:269–278
Saund E (1995) A multiple cause mixture model for unsupervised learning. Neural Comput 7:51–71
Steiner J-F, Auliard O (1992) La sémiometrie: un outil de validation des réponses. In: Lebart L (ed) La Qualité de l’Information dans les Enquêtes. Quality of information in sample surveys. ASU, Dunod, Paris, pp 241–274
Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, pp 592–598. ISBN: 0-262-11245-0
Verbeek JJ, Vlassis N, Kröse BJA (2005) Self-organizing mixture models. Neurocomputing 63:99–123
Yin J, Tan Z (2005) Clustering mixed type attributes in large dataset. In: ISPA, pp 655–661
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lebbah, M., Benabdeslem, K. Visualization and clustering of categorical data with probabilistic self-organizing map. Neural Comput & Applic 19, 393–404 (2010). https://doi.org/10.1007/s00521-009-0299-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-009-0299-2