Abstract
The paper contains a proposal of interval data clustering related to given social and economic objects characterized by many interval variables. This multivariate approach is based on an original conception of interval quantiles constructed using a special definition derived from the notion of the Hausdorff distance. In order to improve the quality of classification, the obtained interval quantile classes can be next aggregated into larger merged classes. The efficiency of our method can be assessed using especially defined indices of entropy and volume coefficients. The second notion replaces the classical concept of area, which is not applicable in this case.
Similar content being viewed by others
References
BILLARD, L., and DIDAY, E. (2000), “Regression Analysis for Interval-Valued Data”, in Data Analysis, Classification and Related Methods, eds. H.A.L. Kiers, J.-P. Rasson, P.J.F. Groenen, and M. Schader, Berlin: Springer Verlag, pp. 369–374.
BILLARD, L., and DIDAY, E. (2002), “Symbolic Regression Analysis”, in Classification, Clustering and Data Analysis. Recent Advances and Applications, eds. K. Jajuga, A. Sokołowski and H.-H. Bock, Berlin–Heidelberg: Springer Verlag, pp. 281–288.
BEN-ISRAEL, A., and IYIGUN, C. (2008), “Probabilistic D-Clustering”, Journal of Classification, 25, 5–26.
CHAVENT, M. (2004), “A Hausdorff Distance Between Hyper-Rectangles for Clustering Interval Data”, in Classification, Clustering and Data Mining Applications, D. Banks, L. House, F. McMorris, P. Arabie, and W. Gaul, Berlin–Heidelberg: Springer Verlag, pp. 333–339.
CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y., and VERDE, R. (2006), “New Clustering Methods for Interval Data”, Computational Statistics, 21, 211–229.
CHAVENT, M., and LECHEVALLIER, Y. (2002), “Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance”, in Classification, Clustering and Data Analysis. Recent Advances and Applications, eds. K. Jajuga, A. Sokołowski, and H.-H. Bock, Berlin–Heidelberg: Springer Verlag, pp. 53–60.
CSO (2007), Life Conditions of the Population in Poland in Years 2004–2005, Central Statistical Office of Poland, Department of Social Statistics, Warszawa. Available also at http://www.stat.gov.pl/cps/rde/xbcr/gus/PUBL_warunki_zycia_2004-2005.pdf.
DE CARVALHO, F.A.T. (2007), “Fuzzy C–means Clustering for Symbolic Interval Data”, Pattern Recognition Letters, 28, 423–427.
DE CARVALHO, F.A.T., BRITO, P., and BOCK, H.-H. (2006 a), “Dynamic Clustering for Interval Data Based on L2-Distance ”, Computational Statistics, 21, 231–250.
DE CARVALHO, F.A.T., DE SOUZA, R.M.C.R., CHAVENT, M., and LECHEVALLIER, Y. (2006 b), “Adaptive Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data”, Pattern Recognition Letters, 27, 167–179.
DE SOUZA, R.M.C.R., and DE CARVALHO, F.A.T. (2004), “Clustering of Interval Data Based on City-Block Distances”, Pattern Recognition Letters, 25, 353–365.
DENOEUD, L., and GUÉNOCHE, A. (2006), “Comparison of Distance Indices Between Partitions”, in Data Science and Classification, Studies in Classification, Data Analysis and Knowledge Organisation Series, eds. V. Batagelj, H.-H. Bock, A. Ferligoj, and A. Žiberna, Berlin–Heidelberg: Springer Verlag, pp. 21–28.
DENNIS, I., and GUIO, A.-C. (2003), “Poverty and Social Exclusion in the EU after Laeken, Part 1–2”, in Population and Social Conditions, Statistics in Focus Series, Theme 3, No. 8–9., Luxembourg: European Communities, EUROSTAT.
FLOREK, K., ŁUKASZEWICZ, J., PERKAL, J., STEINHAUS, H., and ZUBRZYCKI, S. (1951), “Sur la Liaison et la Division des Points d’un Ensemble Fini”, Colloquium Mathematicae, 2, 282–285.
GIOIA, F., and LAURO, C.N. (2006), “Principal Component Analysis on Interval Data”, Computational Statistics, 21, 343–363.
IRPINO, A., and VERDE, R. (2008), “Dynamic Clustering of Interval Data Using a Wasserstein-Based Distance”, Pattern Recognition Letters, 29, 1648–1658.
LI, B. (2006), “A New Approach to Cluster Analysis: The Clustering-Function Based Method”, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 68, 457–475.
MALINA, A., and ZELIAŚ, A. (1998), “On Building Taxonometric Measures on Living Conditions”, Statistics in Transition, 3, 523–544.
MILLIGAN, G.W. (1980), “An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms”, Psychometrika, 45, 325 – 342.
MŁODAK, A. (2002), “An Approach to the Problem of Spatial Differentiation of Multi–feature Objects Using Methods of Game Theory”, Statistics in Transition, 5, 857–872.
MŁODAK, A. (2006), “Multilateral Normalizations of Diagnostic Features”, Statistics in Transition, 7, 1125–1139.
MŁODAK, A. (2008), “Some Modification of the Simple Component Analysis”, Statistics in Transition – New Series, 9, 337–357.
MOORE, R.E. (1966), Interval Analysis, New Jersey: Prentice Hall.
MUNKRES, J. (1999), Topology (2nd ed.); New Jersey: Prentice Hall.
ROUSSEEUW, P.J., and LEROY, A.M. (1987), Robust Regression and Outlier Detection, New York: John Wiley and Sons.
SYMMONS, M.J. (1981), “Clustering Criteria and Multivariate Normal Mixtures”, Biometrics, 37, 35–43.
WAGNER, W., BŁAŻCZAK, P., and BUDKA, A. (2003), “Method of Spatial Units Sorting Using Quantile Spaces on a Correlation Graph”, unpublished manuscript (in Polish).
WARD, J. H. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, 58, 236–244.
WONG, M. A., and LANE, T. (1982), “A k-th Nearest Neighbor Clustering Procedure”, Journal of the Royal Statistical Society, Series B (Statistical Methodology), 45, 362–368.
ZELIAŚ, A. (2002), “Some Notes on the Selection of Normalization of Diagnostic Variables”, Statistics in Transition, 5, 787–802.
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to the Memory of Prof. Dr. Wiesław Wagner and Dr. Piotr Błażczak.
I would like to express my gratitude to Mrs. Paula Brito, Associate Professor in Statistics and Data Analysis at the Faculty of Economics (Group of Mathematics and Informatics) of the University of Porto as well as to three anonymous referees for careful reading of my paper and for very detailed and useful comments and suggestions.
Rights and permissions
About this article
Cite this article
Młodak, A. Classification of Multivariate Objects Using Interval Quantile Classes. J Classif 28, 327–362 (2011). https://doi.org/10.1007/s00357-011-9088-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-011-9088-6