Visualization and clustering of categorical data with probabilistic self-organizing map

Lebbah, Mustapha; Benabdeslem, Khalid

doi:10.1007/s00521-009-0299-2

Visualization and clustering of categorical data with probabilistic self-organizing map

Original Article
Published: 10 September 2009

Volume 19, pages 393–404, (2010)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Mustapha Lebbah¹ &
Khalid Benabdeslem²

331 Accesses
4 Citations
Explore all metrics

Abstract

This paper introduces a self-organizing map dedicated to clustering, analysis and visualization of categorical data. Usually, when dealing with categorical data, topological maps use an encoding stage: categorical data are changed into numerical vectors and traditional numerical algorithms (SOM) are run. In the present paper, we propose a novel probabilistic formalism of Kohonen map dedicated to categorical data where neurons are represented by probability tables. We do not need to use any coding to encode variables. We evaluate the effectiveness of our model in four examples using real data. Our experiments show that our model provides a good quality of results when dealing with categorical data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Theoretical and Applied Aspects of the Self-Organizing Maps

Augmented Classical Self-organizing Map for Visualization of Discrete Data with Density Scaling

Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification

References

Andreopoulos B, An A, Wang X (2006) Bi-level clustering of mixed categorical and numerical biomedical data. Int J Data Min Bioinform 1(1):19–56
Google Scholar
Anouar F, Badran F, Thiria S (1998) Probabilistic self-organizing map and radial basis function networks. Neurocomputing 20(1–3):83–96
Article MATH Google Scholar
Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/∼mlearn/MLRepository.html
Bishop CM, Tipping ME (1998) A hierarchical latent variable model for data visualization. IEEE Trans Pattern Anal Machine Intell 20:281–293
Article Google Scholar
Bishop CM, Svensén M, Williams CKI (1998) GTM: the generative topographic mapping. Neural Comput 10(1)
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and stochastic version. Comput Stat Data Anal 14:351–332
Article MathSciNet Google Scholar
Celeux G, Forbes F, Payrard N (2003) EM procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36:131–144
Article MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc 39(1):1–38
MATH MathSciNet Google Scholar
Dolinicar S, Weingessel A, Buchta C, Dimitriadou E (1998) A Comparaison of several cluster algorithms on artificial binary data, scenarios from travel market segmentation. Working paper series 19, SFB (adaptive information systems and modelling in economics and management science)
Fritzke B (1997) Some competitive learning methods. http://www.neuroinformatik.ruhr-uni-bochum. de/VDM/research/gsn/DemoGNG/GNG.htm.
Girolami M (2001) The topographic organisation and visualisation of binary data using multivariate-Bernoulli latent variable models. IEEE Trans Neural Netw 12(6):1367–1374
Article Google Scholar
Graepel T, Burger M, Obermayer K (1998) Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21:173–190
Article MATH Google Scholar
Heskes T (2001) Self-organizing maps, vector quantization, and mixture modeling. IEEE Trans Neural Netw 12:1299–1305
Article Google Scholar
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Machine Learn 42:177–196
Article MATH Google Scholar
Hsu C-C, Wang K-M, Wang S-H (2006) Gvisom for multivariate mixed data projection and structure visualization. In: International joint conferences on neural networks. IJCNN 16–21 July 2006
Ibbou S, Cottrell M (1995) Multiple correspondance analysis crosstabulation matrix using the Kohonen algorithm. In: Verlaeysen M (ed) Proceedings of ESANN’95, pp 27–32. Dfacto Bruxelles
Jollois F, Nadif M (2007) Speed-up for the expectation-maximization algorithm for clustering categorical data. J Glob Optim 37(4):513–525
Article MATH MathSciNet Google Scholar
Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In: Dove DL, Korb KB, Oliver JJ (eds) Information, statistics and induction in science, ISIS 96, Australia. MIT Press, Cambridge, pp 375–384
Kaban A, Girolami M (2001) A combined latent class and trait model for the analysis and visualization of discrete data. IEEE Trans Pattern Anal Mach Intell 23:859–872
Article Google Scholar
Kaski S, Honkela T, Lagus K, Kohonen T (1998) WEBSOM–self-organizing maps of document collections. Neurocomputing 21:101–117
Article MATH Google Scholar
Kohonen T (2001) Self-organizing maps. Springer, Berlin.
MATH Google Scholar
Kohonen T, Kaski S, Lappalainen H (1997) Self-organized formation of various invariant-feature filters in the adaptive subspace SOM. Neural Comput 9(6):1321–1344
Article Google Scholar
Kostiainen T, Lampinen J (2002) On the generative probability density model in the self-organizing map. Neurocomputing 48:217-228
Article MATH Google Scholar
Lebart L, Piron M, Steiner J-F (2003) La sémiométrie. Dunod, Paris.
Google Scholar
Lebbah M, Thiria S, Badran F (2000) Topological map for binary data, topological map for binary data, ESANN, Bruges, April 26-27-28, (2000), Proceedings.
Lebbah M, Chazottes A, Badran F, Thiria S (2005) Mixed topological map. In: ESANN, pp 357–362
Lebbah M, Rogovschi N, Bennani Y (2007) BeSOM: bernoulli on self organizing map. In: International joint conferences on neural networks. IJCNN 2007, 12–17 August. Orlando, Florida, pp 631–636
Leich F, Weingessel A, Dimitriadou E (1998) E.: Competitive learning for binary data. In: Proceedings of ICANN’98, 2–4 september. Springer, Heidelberg
Luttrel SP (1994) A bayesian analysis of self-organizing maps. Neural Comput 6
Martinetz T, Schulten K (1991) A “neural-gas” network learns topologies. Artif Neural Netw I:397–402
Google Scholar
McLachlan G, Krishman T (1997) The EM algorithm and extensions. Wiley, New York
MATH Google Scholar
Nadif M, Govaert G (1998) Clustering for binary data and mixture models: choice of the model. Appl Stoch Models Data Anal 13:269–278
Article Google Scholar
Saund E (1995) A multiple cause mixture model for unsupervised learning. Neural Comput 7:51–71
Article Google Scholar
Steiner J-F, Auliard O (1992) La sémiometrie: un outil de validation des réponses. In: Lebart L (ed) La Qualité de l’Information dans les Enquêtes. Quality of information in sample surveys. ASU, Dunod, Paris, pp 241–274
Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Proceedings of the 1998 conference on advances in neural information processing systems II. MIT Press, Cambridge, pp 592–598. ISBN: 0-262-11245-0
Verbeek JJ, Vlassis N, Kröse BJA (2005) Self-organizing mixture models. Neurocomputing 63:99–123
Article Google Scholar
Yin J, Tan Z (2005) Clustering mixed type attributes in large dataset. In: ISPA, pp 655–661

Download references

Author information

Authors and Affiliations

University of Paris 13, LIPN-UMR 7030, CNRS, 99, av. J-B Clément, 93430, Villetaneuse, France
Mustapha Lebbah
University of Lyon 1, LIESP EA4125, 69622, Lyon, France
Khalid Benabdeslem

Authors

Mustapha Lebbah
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Benabdeslem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khalid Benabdeslem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lebbah, M., Benabdeslem, K. Visualization and clustering of categorical data with probabilistic self-organizing map. Neural Comput & Applic 19, 393–404 (2010). https://doi.org/10.1007/s00521-009-0299-2

Download citation

Received: 11 October 2008
Accepted: 19 August 2009
Published: 10 September 2009
Issue Date: April 2010
DOI: https://doi.org/10.1007/s00521-009-0299-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visualization and clustering of categorical data with probabilistic self-organizing map

Abstract

Access this article

Similar content being viewed by others

Theoretical and Applied Aspects of the Self-Organizing Maps

Augmented Classical Self-organizing Map for Visualization of Discrete Data with Density Scaling

Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visualization and clustering of categorical data with probabilistic self-organizing map

Abstract

Access this article

Similar content being viewed by others

Theoretical and Applied Aspects of the Self-Organizing Maps

Augmented Classical Self-organizing Map for Visualization of Discrete Data with Density Scaling

Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation