Abstract
We analyse some possibilities for constructing an aggregated measure of the development of socio-economical objects in terms of their composite phenomenon (i.e., phenomenon described by many statistical features) if the relevant data are expressed as intervals. Such a measure, based on the deviation of the data structure for a given object from the benchmark of development is a useful tool for ordering, comparing and clustering objects. We present the construction of a composite phenomenon when it is described by interval data and discuss various aspects of stimulation and normalization of the diagnostic features as well as a definition of a benchmark of development (based usually on optimum or expected levels of these features). Our investigation includes the following options for the realization of this purpose: transformation of the interval model into a single–valued version without any significant loss of its statistical properties, standardization of pure intervals as well as definition of the interval “ideal” object. For the determination of a distance between intervals, the Hausdorff formula is applied. The simulation study conducted and the empirical analysis showed that the first two variants are especially useful in practice.





Similar content being viewed by others
References
Allen J (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843
Anand S, Sen AK (1993) Human development index: methodology and measurements, occasional papers no. 12, human development report office, United Nations Development Program. New York, USA, http://hdr.undp.org/en/media/HDI_methodology.pdf
Ben-Israel A, Iyigun C (2008) Probabilistic d-clustering. J Classif 25:5–26
Bock H-H, Diday E (eds) (2000) Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Springer, Heidelberg
Chavent M (2004) A Hausdorff distance between hyper-rectangles for clustering interval data. In: Banks D, House L, McMorris F, Arabie P, Gaul W (eds) Classification, clustering and data mining applications. Springer, Berlin pp 333–339
Chavent M, Lechevallier Y (2002) Classification, dynamical clustering of interval data: optimization of an adequacy criterion based on Hausdorff distance. In: Jajuga K, Sokołowski A, Bock H-H (eds) Clustering and data analysis. Recent advances and applications. Springer, Berlin, pp 53–60
Chavent M, de Carvalho FAT, Lechevallier Y, Verde R (2006) New clustering methods for interval data. Comput Stat 21:211–229
Chavent M, Saracco J (2008) On central tendency and dispersion measures for intervals and hypercubes. Commun Stat Theory Methods 37:1471–1482
CSO (2007) Life conditions of the population in Poland in years 2004–2005, Central Statistical Office of Poland, Department of Social Statistics, Warszawa. Available also at http://www.stat.gov.pl/cps/rde/xbcr/gus/publ_warunki_zycia2004-2005.pdf
Dennis I, Guio A-C (2003) Poverty and social exclusion in the EU after Laeken, part 1–2. In: Population and social conditions. Series: Statistics in focus, European Communities, EUROSTAT, Luxembourg, Theme 3, No. 8–9
Gioia F, Lauro CN (2006) Principal component analysis on interval data. Comput Stat 21:343–363
Haldane JBS (1948) Note on the median of a multivariate distribution. Biometrika 35:414–415
Hellwig Z (1968) Procedure to evaluating high level manpower data and typology of countries by means of the taxonomic method. Stat Rev XV(4), 307-327 (in Polish)
Huang M-H (2011) A comparison of three major academic rankings for World Universities: from a research evaluation perspective. J Libr Inf Stud 9:1–25
Malina A, Zeliaś A (1998) On building taxonometric measures on living conditions. Stat Transition 3(3):523–544
Młodak A (2002) An approach to the problem of spatial differentiation of multi-feature objects using methods of game theory. Stat Transition 5(5):857–872
Młodak A (2004) An application of Shapley value coefficients in a numerical taxonomy. Stat Rev LI(4):101–114
Młodak A (2006) Multilateral normalizations of diagnostic features. Stat Transition 7(5):1125–1139
Młodak A, Kubacki J (2010) A typology of Polish farms using some fuzzy classification method. Stat Transition New Ser 11:615–638
Młodak A (2011) Classification of multivariate objects using interval quantile classes. J Classif 28:327–362
Munkres J (1999) Topology, 2nd edition. Prentice Hall, Englewood Cliffs
Renz M (2006) Enhanced query processing on complex spatial and temporal data. Dissertation an der Fakultät für Mathematik, Informatik und Statistik der Ludwig-Maximilians Universität München, München, Germany, available at: http://edoc.ub.uni-muenchen.de/archive/00006231/01/renz_matthias.pdf
Rodgers JL, Nicewander WA (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42:59–66
Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York
Vandev DL (2002) Computing of trimmed L1—median, Laboratory of Computer Stochastics, Institute of Mathematics, Bulgarian Academy of Sciences, (preprint), available at http://www.fmi.uni-sofia.bg/fmi/statist/Personal/Vandev/papers/aspap.pdf
Weber A (1909, reprint 1971) Theory of Location of Industries, Translated with an introduction and notes by Carl J. Friedrich, Ed. by Russel & Russel, New York
Zeliaś A (2002) Some notes on the selection of normalization of diagnostic variables. Stat Transition 5(5):787–802
Acknowledgments
I would like to express my gratitude to the anonymous reviewer for interesting and very useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Młodak, A. On the construction of an aggregated measure of the development of interval data. Comput Stat 29, 895–929 (2014). https://doi.org/10.1007/s00180-013-0469-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-013-0469-7