Skip to main content

Hard and Soft Euclidean Consensus Partitions

  • Conference paper

Abstract

Euclidean partition dissimilarity d(P, P∼) (Dimitriadou et al., 2002) is defined as the square root of the minimal sum of squared differences of the class membership values of the partitions P and P∼, with the minimum taken over all matchings between the classes of the partitions. We first discuss some theoretical properties of this dissimilarity measure. Then, we look at the Euclidean consensus problem for partition ensembles, i.e., the problem to find a hard or soft partition P with a given number of classes which minimizes the (possibly weighted) sum Σb w b d(P b ,P)2 of squared Euclidean dissimilarities d between P and the elements P b , of the ensemble. This is an NP-hard problem, and related to consensus problems studied in Gordon and Vichi (2001). We present an efficient “Alternating Optimization” (AO) heuristic for finding P, which iterates between optimally rematching classes for fixed memberships, and optimizing class memberships for fixed matchings. An implementation of such AO algorithms for consensus partitions is available in the R extension package clue. We illustrate this algorithm on two data sets (the popular Rosenberg-Kim kinship terms data and a macroeconomic one) employed by Gordon & Vichi.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BARTHÉLEMY, J.P. and MONJARDET, B. (1981): The median procedure in cluster analysis and social choice theory. Mathematical Social Sciences, 1, 235-267.

    Article  MATH  Google Scholar 

  • BARTHÉLEMY, J.P. and MONJARDET, B. (1988): The median procedure in data analysis: new results and open problems. In: H. H. Bock, editor, Classification and related methods of data analysis. North-Holland, Amsterdam, 309-316.

    Google Scholar 

  • BOORMAN, S. A. and ARABIE, P. (1972): Structural measures and the method of sorting. In R. N. Shepard, A. K. Romney and S. B. Nerlove, editors, Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, 1: Theory. Seminar Press, New York, 225-249.

    Google Scholar 

  • CHARON, I., DENOEUD, L., GUENOCHE, A. and HUDRY, O. (2006): Maximum transfer distance between partitions. Journal of Classification, 23(1), 103-121.

    Article  MathSciNet  Google Scholar 

  • DAY, W. H. E. (1981): The complexity of computing metric distances between partitions. Mathematical Social Sciences, 1, 269-287.

    Article  MATH  Google Scholar 

  • DIMITRIADOU, E., WEINGESSEL, A. and HORNIK, K. (2002): A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16 (7),901-912.

    Article  Google Scholar 

  • GAUL, W. and SCHADER, M. (1988): Clusterwise aggregation of relations. Applied Stochastic Models and Data Analysis, 4, 273-282.

    Article  MathSciNet  Google Scholar 

  • GORDON, A. D. and VICHI, M. (1998): Partitions of partitions. Journal of Classification, 15,265-285.

    Article  MATH  Google Scholar 

  • GORDON, A. D. and VICHI, M. (2001): Fuzzy partition models for fitting a set of partitions. Psychometrika, 66(2), 229-248.

    Article  MathSciNet  Google Scholar 

  • GUSFIELD, D. (2002): Partition-distance: A problem and class of perfect graphs arising in clustering. Information Processing Letters, 82, 159-164.

    Article  MATH  MathSciNet  Google Scholar 

  • HORNIK, K. (2005a): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14 (12). URL http://www.jstatsoft.org/v14/i12/.

  • HORNIK, K. (2005b): Cluster ensembles. In C. Weihs and W. Gaul, editors, Classifi-cation - The Ubiquitous Challenge. Proceedings of the 28th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Dortmund, March 9-11, 2004. Springer-Verlag, Heidelberg, 65-72.

    Google Scholar 

  • HORNIK, K. (2007a): clue: Cluster Ensembles. R package version 0.3-12.

    Google Scholar 

  • HORNIK, K. (2007b): On maximal euclidean partition dissimilarity. Under preparation.

    Google Scholar 

  • HORNIK, K. and BÖHM, W. (2007): Alternating optimization algorithms for Euclidean and Manhattan consensus partitions. Under preparation.

    Google Scholar 

  • MIRKIN, B.G. (1974): The problem of approximation in space of relations and qualitative data analysis. Automatika y Telemechanika, translated in: Information and Remote Con-trol, 35, 1424-1438.

    Google Scholar 

  • PAPADIMITRIOU, C. and STEIGLITZ, K. (1982): Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, Englewood Cliffs.

    MATH  Google Scholar 

  • ROSENBERG, S. (1982): The method of sorting in multivariate research with applications selected from cognitive psychology and person perception. In N. Hirschberg and L. G. Humphreys, editors, Multivariate Applications in the Social Sciences. Erlbaum, Hills-dale, New Jersey, 117-142.

    Google Scholar 

  • ROSENBERG, S. and KIM, M. P. (1975): The method of sorting as a data-gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.

    Article  Google Scholar 

  • RUBIN, J. (1967): Optimal classification into groups: An approach for solving the taxonomy problem. Journal of Theoretical Biology, 15, 103-144.

    Article  Google Scholar 

  • WAKABAYASHI, Y. (1998): The complexity of computing median relations. Resenhas do Instituto de Mathematica ed Estadistica, Universidade de Sao Paolo, 3/3, 323-349.

    MATH  MathSciNet  Google Scholar 

  • ZHOU, D., LI, J. and ZHA, H. (2005): A new Mallows distance based metric for comparing clusterings. In ICML ’05: Proceedings of the 22nd International Conference on Machine Learning. ISBN 1-59593-180-5. ACM Press, New York, NY, USA, 1028-1035.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hornik, K., Böhm, W. (2008). Hard and Soft Euclidean Consensus Partitions. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_18

Download citation

Publish with us

Policies and ethics