Abstract
Clustering is an unsupervised learning task which provides a decomposition of a dataset into subgroups that summarize the initial base and give information about its structure. We propose to enrich this result by a numerical coefficient that describes the cluster representativity and indicates the extent to which they are characteristic of the whole dataset. It is defined for a specific clustering algorithm, called Outlier Preserving Clustering Algorithm, opca, which detects clusters associated with major trends but also with marginal behaviors, in order to offer a complete description of the inital dataset. The proposed representativity measure exploits the iterative process of opca to compute the typicality of each identified cluster.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnett, V., Lewis, T.: Outliers in statistical data. Wiley and Sons, Chichester (1994)
Breunig, M., Kriegel, H., Ng, R., Sander, J.: LOF: Identifying density-based local outliers. In: Proc. of ACM SIGMOD, vol. 29, pp. 93–104. ACM, New York (2000)
Davé, R.: Characterization and detection of noise in clustering. Pattern Recognition Letters 12, 657–664 (1991)
Davé, R., Khrishnapuram, R.: Robust clustering methods: a unified view. IEEE Transactions on fuzzy systems 5(2), 270–293 (1997)
Frigui, H., Krishnapuram, R.: A robust competitive clustering algorithm with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5), 450–465 (1999)
Höppner, F., Klawonn, F., Kruse, R., Runkler, T.: Fuzzy Cluster Analysis, Methods for classification, data analysis and image recognition. Wiley, Chichester (2000)
Jain, A., Murty, M., Flynn, P.: Data clustering: a review. ACM Computing survey 31(3), 264–323 (1999)
Knorr, E., Ng, R., Tucakov, V.: Distance based outliers: algorithms and applications. Very Large Data Bases Journal 8(3-4), 237–253 (2000)
Lesot, M.J., Bouchon-Meunier, B.: Extraction de concepts descriptifs avec exceptions par classification non supervisée hybride. In: Rencontres Francophones sur la Logique Floue et ses Applications, LFA 2003, Tours, France (2003)
Rifqi, M.: Mesure de comparaison, typicalité et classification d’objets flous: théorie et pratique. PhD thesis, Université de Paris VI (1996)
Saint-Jean, C., Frélicot, C.: An hybrid parametric model for semi-supervised robust clustering. In: Int. Conf. on Recent Developments in Mixture Modelling (MIXTURES), Hambourg, Germany (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lesot, MJ., Bouchon-Meunier, B. (2004). Cluster Characterization through a Representativity Measure. In: Christiansen, H., Hacid, MS., Andreasen, T., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2004. Lecture Notes in Computer Science(), vol 3055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25957-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-25957-2_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22160-9
Online ISBN: 978-3-540-25957-2
eBook Packages: Springer Book Archive