Skip to main content

Symbolic Clustering of Large Datasets

  • Conference paper
Data Science and Classification

Abstract

We present an approach to cluster large datasets that integrates the Kohonen Self Organizing Maps (SOM) with a dynamic clustering algorithm of symbolic data (SCLUST). A preliminary data reduction using SOM algorithm is performed. As a result, the individual measurements are replaced by micro-clusters. These micro-clusters are then grouped in a few clusters which are modeled by symbolic objects. By computing the extension of these symbolic objects, symbolic clustering algorithm allows discovering the natural classes. An application on a real data set shows the usefulness of this methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AMBROISE, C., SEZE, G., BADRAN, F. and THIRIA, S. (2000): Hierarchical clustering of Self-Organizing Maps for cloud classification. Neurocomputing, 30, 47–52.

    Article  Google Scholar 

  • BOCK, H.H. and DIDAY, E. (2000): Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.

    Google Scholar 

  • BREIMAN, L., FRIEDMAN, J.H., OSLHEN, R.A. and STONE, C.J. (1984): Classification and regression trees. Chapman & Hall/CRC.

    Google Scholar 

  • CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAMBONDRAINY, H. (1988): Classification Automatique des Données: Environnement Statistique et Informatique. Dunod, Gauthier-Villards, Paris.

    Google Scholar 

  • CHAVENT, M. and LECHEVALLIER, Y. (2002). Dynamical Clustering Algorithm of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance. In: A. Sokolowski and H.-H. Bock (Eds.): Classification, Clustering and Data Analysis. Springer, Heidelberg, 53–59.

    Google Scholar 

  • CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y. and VERDE, R. (2003). Trois nouvelles mthodes de classification automatique de donnes symboliques de type intervalle. Revue de Statistique Applique, v. LI, n. 4, p. 5–29.

    Google Scholar 

  • DE CARVALHO, F.A.T., VERDE, R. and LECHEVALLIER, Y. (1999). A dynamical clustering of symbolic objcts based on a context dependent proximity measure. In: Proceedings of the IX International Symposium on Applied Stochastic Models and Data analysis. Lisboa, p. 237–242.

    Google Scholar 

  • DIDAY, E. and SIMON, J.J. (1976): Clustering Analysis. In: Fu, K. S. (Eds): Digital Pattern Recognition. Springer-Verlag, Heidelberg, 47–94.

    Google Scholar 

  • DIDAY, E. (2001). An Introduction to Symbolic Data Analysis and SODAS software. Tutorial on Symbolic Data Analysis. GfKl 2001, Munich.

    Google Scholar 

  • GORDON, A.D. (1999): Classification. Chapman and Hall/CRC, Florida.

    MATH  Google Scholar 

  • ICHINO, M. and YAGUCHI, H. (1994). Generalized Minkowski Metrics for Mixed Feature Type Data Analysis. IEEE Trans. Systems Man and Cybernetics, 1, 494–497.

    MathSciNet  Google Scholar 

  • LECHEVALLIER, Y. and CIAMPI A. (2004): Clustering large and Multi-levels Data Sets. In: International Conference on Statistics in Heath Sciences 2004, Nantes.

    Google Scholar 

  • MICHALSKI, R.S., DIDAY, E. and STEPP, R.E.(1981). A recent advance in data analysis: Clustering Objects into classes characterized by conjunctive concepts. In: Kanal L. N., Rosenfeld A. (Eds.): Progress in pattern recognition. North-Holland, 33–56.

    Google Scholar 

  • MURTAGH, F. (1995): Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering. Patterns Recognition Letters, 16, 399–408.

    Article  Google Scholar 

  • VERDE, R., LECHEVALLIER, Y. and DE CARVALHO, F.A.T. (2001): A dynamical clustering algorithm for symbolic data. Tutorial Symbolic Data Analysis, GfKl, Munich.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Lechevallier, Y., Verde, R., de Carvalho, F.d.A.T. (2006). Symbolic Clustering of Large Datasets. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_21

Download citation

Publish with us

Policies and ethics