Skip to main content

Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering

  • Chapter
Book cover Data Analysis and Decision Support

Abstract

’symbolic Data Analysis’ (SDA) provides tools for analyzing ’symbolic’ data, i.e., data matrices X = (xkj) where the entries xkj are intervals, sets of categories, or frequency distributions instead of ‘single values’ (a real number, a category) as in the classical case. There exists a large number of empirical algorithms that generalize classical data analysis methods (PCA, clustering, factor analysis, etc.) to the ‘symbolic’ case. In this context, various optimization problems are formulated (optimum class centers, optimum clustering, optimum scaling,…). This paper presents some cases related to dissimilarities and class centers where explicit solutions are possible. We can integrate these results in the context of an appropriate κ-means clustering algorithm. Moreover, and as a first step to probabilistically based results in SDA, we consider the definition and determination of set-valued class ‘centers’ in SDA and relate them to theorems on the ‘approximation of distributions by sets’.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AUMANN, R.J. (1965): Integrals and Set-Valued Functions. J. Math. Analysis and Appl. 12, 1–12.

    Article  MATH  MathSciNet  Google Scholar 

  • BOCK, H.-H. (2002): Clustering Methods and Kohonen Maps for Symbolic Data. J. Japan. Soc. Comput. Statistics 15, 1–13.

    MathSciNet  Google Scholar 

  • BOCK, H.-H. (2005): Visualizing Symbolic Data by Kohonen Maps. In: M. Noirhomme and E. Diday (Eds.): Symbolic Data Analysis and the SODAS Software. Wiley, New York. (In press.)

    Google Scholar 

  • BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg-Berlin.

    Google Scholar 

  • CHAVENT, M. (2004): A Hausdorff Distance Between Hyperrectangles for Clustering Interval Data. In: D. Banks, L. House, F.R. McMorris, P. Arabie, and W. Gaul (Eds.): Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg, 2004, 333–339.

    Google Scholar 

  • CHAVENT, M. and LECHEVALLIER, Y. (2002): Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance. In: K. Jajuga, A. Sokolowski, and H.-H. Bock (Eds.): Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Verlag, Heidelberg, 2002, 53–60.

    Google Scholar 

  • KÖRNER, R. (1995): A Variance of Compact Convex Random Sets. Fakultät für Mathematik und Informatik, Bergakademie Freiberg.

    Google Scholar 

  • KÄÄRIK, M. (2000): Approximation of Distributions by Spheres. In: Multivariate Statistics. New Trends in Probability and Statistics. Vol. 5. VSP/TEV, Vilnius-Utrecht-Tokyo, 61–66.

    Google Scholar 

  • KÄÄRIK, M. (2005): Fitting Sets to Probability Distributions. Doctoral thesis, Faculty of Mathematics and Computer Science, University of Tartu, Estonia.

    Google Scholar 

  • KÄÄRIK, M. and PÄRNA, K. (2003): Fitting Parametric Sets to Probability Distributions. Acta et Commentationes Universitatis Tartuensis de Mathematica 8, 101–112.

    Google Scholar 

  • MATHÉRON, G. (1975): Random Sets and Integral Geometry. Wiley, New York.

    Google Scholar 

  • MOLCHANOV, I. (1997): Statistical Problems for Random sets. In: J. Goutsias (Ed.): Random Sets: Theory and Applications. Springer, Heidelberg, 27–45.

    Google Scholar 

  • NORDHOFF, O. (2003): Erwartungswerte zufälliger Quader. Diploma thesis, Institute of Statistics, RWTH Aachen University.

    Google Scholar 

  • PÄRNA, K., LEMBER, J., and VIIART, A. (1999): Approximating Distributions by Sets. In: W. Gaul and H. Locarek-Junge (Eds.): Classification in the Information Age. Studies in Classification, Data Analysis, and Konowledge Organization. Springer, Heidelberg, 215–224.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this chapter

Cite this chapter

Bock, HH. (2005). Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds) Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28397-8_1

Download citation

Publish with us

Policies and ethics