Abstract
’symbolic Data Analysis’ (SDA) provides tools for analyzing ’symbolic’ data, i.e., data matrices X = (xkj) where the entries xkj are intervals, sets of categories, or frequency distributions instead of ‘single values’ (a real number, a category) as in the classical case. There exists a large number of empirical algorithms that generalize classical data analysis methods (PCA, clustering, factor analysis, etc.) to the ‘symbolic’ case. In this context, various optimization problems are formulated (optimum class centers, optimum clustering, optimum scaling,…). This paper presents some cases related to dissimilarities and class centers where explicit solutions are possible. We can integrate these results in the context of an appropriate κ-means clustering algorithm. Moreover, and as a first step to probabilistically based results in SDA, we consider the definition and determination of set-valued class ‘centers’ in SDA and relate them to theorems on the ‘approximation of distributions by sets’.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AUMANN, R.J. (1965): Integrals and Set-Valued Functions. J. Math. Analysis and Appl. 12, 1–12.
BOCK, H.-H. (2002): Clustering Methods and Kohonen Maps for Symbolic Data. J. Japan. Soc. Comput. Statistics 15, 1–13.
BOCK, H.-H. (2005): Visualizing Symbolic Data by Kohonen Maps. In: M. Noirhomme and E. Diday (Eds.): Symbolic Data Analysis and the SODAS Software. Wiley, New York. (In press.)
BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg-Berlin.
CHAVENT, M. (2004): A Hausdorff Distance Between Hyperrectangles for Clustering Interval Data. In: D. Banks, L. House, F.R. McMorris, P. Arabie, and W. Gaul (Eds.): Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg, 2004, 333–339.
CHAVENT, M. and LECHEVALLIER, Y. (2002): Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance. In: K. Jajuga, A. Sokolowski, and H.-H. Bock (Eds.): Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg, 2002, 53–60.
KÖRNER, R. (1995): A Variance of Compact Convex Random Sets. Fakultät für Mathematik und Informatik, Bergakademie Freiberg.
KÄÄRIK, M. (2000): Approximation of Distributions by Spheres. In: Multivariate Statistics. New Trends in Probability and Statistics. Vol. 5. VSP/TEV, Vilnius-Utrecht-Tokyo, 61–66.
KÄÄRIK, M. (2005): Fitting Sets to Probability Distributions. Doctoral thesis, Faculty of Mathematics and Computer Science, University of Tartu, Estonia.
KÄÄRIK, M. and PÄRNA, K. (2003): Fitting Parametric Sets to Probability Distributions. Acta et Commentationes Universitatis Tartuensis de Mathematica 8, 101–112.
MATHÉRON, G. (1975): Random Sets and Integral Geometry. Wiley, New York.
MOLCHANOV, I. (1997): Statistical Problems for Random sets. In: J. Goutsias (Ed.): Random Sets: Theory and Applications. Springer, Heidelberg, 27–45.
NORDHOFF, O. (2003): Erwartungswerte zufälliger Quader. Diploma thesis, Institute of Statistics, RWTH Aachen University.
PÄRNA, K., LEMBER, J., and VIIART, A. (1999): Approximating Distributions by Sets. In: W. Gaul and H. Locarek-Junge (Eds.): Classification in the Information Age. Studies in Classification, Data Analysis, and Konowledge Organization. Springer, Heidelberg, 215–224.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this chapter
Cite this chapter
Bock, HH. (2005). Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds) Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28397-8_1
Download citation
DOI: https://doi.org/10.1007/3-540-28397-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26007-3
Online ISBN: 978-3-540-28397-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)