Abstract
We present an approach to cluster large datasets that integrates the Kohonen Self Organizing Maps (SOM) with a dynamic clustering algorithm of symbolic data (SCLUST). A preliminary data reduction using SOM algorithm is performed. As a result, the individual measurements are replaced by micro-clusters. These micro-clusters are then grouped in a few clusters which are modeled by symbolic objects. By computing the extension of these symbolic objects, symbolic clustering algorithm allows discovering the natural classes. An application on a real data set shows the usefulness of this methodology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
AMBROISE, C., SEZE, G., BADRAN, F. and THIRIA, S. (2000): Hierarchical clustering of Self-Organizing Maps for cloud classification. Neurocomputing, 30, 47–52.
BOCK, H.H. and DIDAY, E. (2000): Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg.
BREIMAN, L., FRIEDMAN, J.H., OSLHEN, R.A. and STONE, C.J. (1984): Classification and regression trees. Chapman & Hall/CRC.
CELEUX, G., DIDAY, E., GOVAERT, G., LECHEVALLIER, Y. and RALAMBONDRAINY, H. (1988): Classification Automatique des Données: Environnement Statistique et Informatique. Dunod, Gauthier-Villards, Paris.
CHAVENT, M. and LECHEVALLIER, Y. (2002). Dynamical Clustering Algorithm of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance. In: A. Sokolowski and H.-H. Bock (Eds.): Classification, Clustering and Data Analysis. Springer, Heidelberg, 53–59.
CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y. and VERDE, R. (2003). Trois nouvelles mthodes de classification automatique de donnes symboliques de type intervalle. Revue de Statistique Applique, v. LI, n. 4, p. 5–29.
DE CARVALHO, F.A.T., VERDE, R. and LECHEVALLIER, Y. (1999). A dynamical clustering of symbolic objcts based on a context dependent proximity measure. In: Proceedings of the IX International Symposium on Applied Stochastic Models and Data analysis. Lisboa, p. 237–242.
DIDAY, E. and SIMON, J.J. (1976): Clustering Analysis. In: Fu, K. S. (Eds): Digital Pattern Recognition. Springer-Verlag, Heidelberg, 47–94.
DIDAY, E. (2001). An Introduction to Symbolic Data Analysis and SODAS software. Tutorial on Symbolic Data Analysis. GfKl 2001, Munich.
GORDON, A.D. (1999): Classification. Chapman and Hall/CRC, Florida.
ICHINO, M. and YAGUCHI, H. (1994). Generalized Minkowski Metrics for Mixed Feature Type Data Analysis. IEEE Trans. Systems Man and Cybernetics, 1, 494–497.
LECHEVALLIER, Y. and CIAMPI A. (2004): Clustering large and Multi-levels Data Sets. In: International Conference on Statistics in Heath Sciences 2004, Nantes.
MICHALSKI, R.S., DIDAY, E. and STEPP, R.E.(1981). A recent advance in data analysis: Clustering Objects into classes characterized by conjunctive concepts. In: Kanal L. N., Rosenfeld A. (Eds.): Progress in pattern recognition. North-Holland, 33–56.
MURTAGH, F. (1995): Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering. Patterns Recognition Letters, 16, 399–408.
VERDE, R., LECHEVALLIER, Y. and DE CARVALHO, F.A.T. (2001): A dynamical clustering algorithm for symbolic data. Tutorial Symbolic Data Analysis, GfKl, Munich.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Lechevallier, Y., Verde, R., de Carvalho, F.d.A.T. (2006). Symbolic Clustering of Large Datasets. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_21
Download citation
DOI: https://doi.org/10.1007/3-540-34416-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34415-5
Online ISBN: 978-3-540-34416-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)