Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering

Bock, Hans-Hermann

doi:10.1007/3-540-28397-8_1

Hans-Hermann Bock²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2363 Accesses
4 Citations

Abstract

’symbolic Data Analysis’ (SDA) provides tools for analyzing ’symbolic’ data, i.e., data matrices X = (x_kj) where the entries x_kj are intervals, sets of categories, or frequency distributions instead of ‘single values’ (a real number, a category) as in the classical case. There exists a large number of empirical algorithms that generalize classical data analysis methods (PCA, clustering, factor analysis, etc.) to the ‘symbolic’ case. In this context, various optimization problems are formulated (optimum class centers, optimum clustering, optimum scaling,…). This paper presents some cases related to dissimilarities and class centers where explicit solutions are possible. We can integrate these results in the context of an appropriate κ-means clustering algorithm. Moreover, and as a first step to probabilistically based results in SDA, we consider the definition and determination of set-valued class ‘centers’ in SDA and relate them to theorems on the ‘approximation of distributions by sets’.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AUMANN, R.J. (1965): Integrals and Set-Valued Functions. J. Math. Analysis and Appl. 12, 1–12.
Article MATH MathSciNet Google Scholar
BOCK, H.-H. (2002): Clustering Methods and Kohonen Maps for Symbolic Data. J. Japan. Soc. Comput. Statistics 15, 1–13.
MathSciNet Google Scholar
BOCK, H.-H. (2005): Visualizing Symbolic Data by Kohonen Maps. In: M. Noirhomme and E. Diday (Eds.): Symbolic Data Analysis and the SODAS Software. Wiley, New York. (In press.)
Google Scholar
BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg-Berlin.
Google Scholar
CHAVENT, M. (2004): A Hausdorff Distance Between Hyperrectangles for Clustering Interval Data. In: D. Banks, L. House, F.R. McMorris, P. Arabie, and W. Gaul (Eds.): Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg, 2004, 333–339.
Google Scholar
CHAVENT, M. and LECHEVALLIER, Y. (2002): Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance. In: K. Jajuga, A. Sokolowski, and H.-H. Bock (Eds.): Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg, 2002, 53–60.
Google Scholar
KÖRNER, R. (1995): A Variance of Compact Convex Random Sets. Fakultät für Mathematik und Informatik, Bergakademie Freiberg.
Google Scholar
KÄÄRIK, M. (2000): Approximation of Distributions by Spheres. In: Multivariate Statistics. New Trends in Probability and Statistics. Vol. 5. VSP/TEV, Vilnius-Utrecht-Tokyo, 61–66.
Google Scholar
KÄÄRIK, M. (2005): Fitting Sets to Probability Distributions. Doctoral thesis, Faculty of Mathematics and Computer Science, University of Tartu, Estonia.
Google Scholar
KÄÄRIK, M. and PÄRNA, K. (2003): Fitting Parametric Sets to Probability Distributions. Acta et Commentationes Universitatis Tartuensis de Mathematica 8, 101–112.
Google Scholar
MATHÉRON, G. (1975): Random Sets and Integral Geometry. Wiley, New York.
Google Scholar
MOLCHANOV, I. (1997): Statistical Problems for Random sets. In: J. Goutsias (Ed.): Random Sets: Theory and Applications. Springer, Heidelberg, 27–45.
Google Scholar
NORDHOFF, O. (2003): Erwartungswerte zufälliger Quader. Diploma thesis, Institute of Statistics, RWTH Aachen University.
Google Scholar
PÄRNA, K., LEMBER, J., and VIIART, A. (1999): Approximating Distributions by Sets. In: W. Gaul and H. Locarek-Junge (Eds.): Classification in the Information Age. Studies in Classification, Data Analysis, and Konowledge Organization. Springer, Heidelberg, 215–224.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Statistik und Wirtschaftsmathematik, RWTH Aachen, Wüllnerstr. 3, D-52056, Aachen, Germany
Hans-Hermann Bock

Authors

Hans-Hermann Bock
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Business Administration and Economics, Brandenburg University of Technology Cottbus, Konrad-Wachsmann-Allee 1, 03046, Cottbus, Germany
Daniel Baier (Chair of Marketing and Innovation Management) (Chair of Marketing and Innovation Management)
Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33615, Bielefeld, Germany
Reinhold Decker (Chair of Marketing) (Chair of Marketing)
Computer Based New Media Group (CGNM), Institute for Computer Science, University of Freiburg, Georges-Köhler-Allee 51, 79110, Freiburg, Germany
Lars Schmidt-Thieme

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bock, HH. (2005). Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds) Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28397-8_1

Download citation

DOI: https://doi.org/10.1007/3-540-28397-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26007-3
Online ISBN: 978-3-540-28397-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics