Cluster Quality Indexes for Symbolic Classification — An Examination

Dudek, Andrzej

doi:10.1007/978-3-540-70981-7_4

Andrzej Dudek³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3776 Accesses
1 Citations

Abstract

The paper presents difficulties of measuring clustering quality for symbolic data (such as lack of a “traditional” data matrix). Some hints concerning the usage of known indexes for such kind of data are given and indexes designed exclusively for symbolic data are described. Finally, after the presentation of simulation results, some proposals for choosing the most adequate indexes for popular classification algorithms are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BAKER, F.B. and HUBERT, L.J. (1975): Measuring the Power of Hierarchical Cluster Analysis. Journal of the American Statistical Association, 70,349, 31–38.
Article MATH Google Scholar
BALL, F.B. and HALL, D.J. (1965): ISODATA, A Novel Method of Data Analysis and Pattern Classification. Tech. Rep. NTIS No.AD 699616, Stanford Research Institute, Menlo Park.
Google Scholar
BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data. Explanatory Methods for Extracting Statistical Information from Complex Data. Springer, Berlin.
Google Scholar
CALIŃSKI, R.B. and HARABASZ, J. (1974): A Dendrite Method for Cluster Analysis. Communications in Statistics, 3, 1–27.
Article MATH Google Scholar
CHAVENT, M., DE CARVALHO, F.A.T., VERDE, R. and LECHEVALLIER, Y. (2003): Trois Nouvelle Méthodes de Classification Automatique de Données Symboliques de Type Intervalle. Revue de Statistique Appliquée, LI 4, 5–29.
Google Scholar
DIDAY, E. (2002): An Introduction to Symbolic Data Analysis and the SODAS Software. J.S.D.A., International EJournal.
Google Scholar
FRIEDMAN, H.P. and RUBIN, J. (1967): On Some Invariant Criteria for Grouping Data. Journal of the American Statistical Association, 62, 1159–1178.
Article MathSciNet Google Scholar
GORDON, A.D. (1999): Classification, Chapman & Hall/CRC, London.
MATH Google Scholar
HARDY, A. (2005): Validation of Unsupervised Symbolic Classification. Proceedings of ASMDA 2005 Conference. Available at URL: http://asmda2005.enst-bretagne.fr/IMG/pdf/proceedings/379.pdf.
Google Scholar
HARTIGAN, J.A. (1975): Clustering Algorithms. New York, Wiley.
MATH Google Scholar
HUBERT, L.J. (1974): Approximate Evaluation Technique for the Single-link and Complete-link Hierarchical Clustering Procedures. Journal of the American Statistical Association, 69,347, 698–704.
Article MathSciNet MATH Google Scholar
HUBERT, L.J. and LEVINE, J.R. (1976): Evaluating Object Set Partitions: Free Sort Analysis and Some Generalizations. Journal of Verbal Learning and Verbal Behaviour, 15, 549–570.
Article Google Scholar
KAUFMAN, L. and ROUSSEEUW, P.J. (1990): Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
Book Google Scholar
KRZANOWSKI, W.J. and LAI, Y.T. (1988): A Criterion for Determining the Number of Groups in a Data Set Using Sum of Squares Clustering. Biometrics, 44, 23–34.
Article MathSciNet MATH Google Scholar
MARRIOT, F.H. (1971). Practical Problems in a Method of Cluster Analysis. Biometrics, 27, 501–514.
Article Google Scholar
MCQUITTY, L.L. (1966): Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data. Educational and Psychological Measurement, 26, 825–831.
Article Google Scholar
MILLIGAN, G.W. and COOPER, M.C. (1985): An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika, 2, 159–179.
Article Google Scholar
RATKOVSKI, D.A. and LANCE, G.N. (1978) A Criterion for Determining a Number of Groups in a Classification. Australian Computer Journal, 10, 115–117.
Google Scholar
ROUSSEEUW, P.J. (1987): Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Article MATH Google Scholar
SCOTT, A.J. and SYMONS, M.J. (1971) Clustering Methods Based on Likelihood Ratio Criteria. Biometrics, 27, 387–397.
Article Google Scholar
VERDE, R. (2004): Clustering Methods in Symbolic Data Analysis. In: D. Banks et al. (Eds.): Classification, Clustering and Data Mining Applications, Springer, Berlin, 299–318.
Chapter Google Scholar
WEINGESSEL, A., DIMITRIADOU, A. and DOLNICAR, S. (1999): An Examination Of Indexes For Determining The Number Of Clusters In Binary Data Sets. Available at URL: http://www.wu-wien.ac.at/am/wp99.htm#29.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Econometrics and Computer Science, Wrocław University of Economics, Nowowiejska 3, 58-500, Jelenia Góra, Poland
Andrzej Dudek

Authors

Andrzej Dudek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dudek, A. (2007). Cluster Quality Indexes for Symbolic Classification — An Examination. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics