Abstract
Euclidean partition dissimilarity d(P, P∼) (Dimitriadou et al., 2002) is defined as the square root of the minimal sum of squared differences of the class membership values of the partitions P and P∼, with the minimum taken over all matchings between the classes of the partitions. We first discuss some theoretical properties of this dissimilarity measure. Then, we look at the Euclidean consensus problem for partition ensembles, i.e., the problem to find a hard or soft partition P with a given number of classes which minimizes the (possibly weighted) sum Σb w b d(P b ,P)2 of squared Euclidean dissimilarities d between P and the elements P b , of the ensemble. This is an NP-hard problem, and related to consensus problems studied in Gordon and Vichi (2001). We present an efficient “Alternating Optimization” (AO) heuristic for finding P, which iterates between optimally rematching classes for fixed memberships, and optimizing class memberships for fixed matchings. An implementation of such AO algorithms for consensus partitions is available in the R extension package clue. We illustrate this algorithm on two data sets (the popular Rosenberg-Kim kinship terms data and a macroeconomic one) employed by Gordon & Vichi.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BARTHÉLEMY, J.P. and MONJARDET, B. (1981): The median procedure in cluster analysis and social choice theory. Mathematical Social Sciences, 1, 235-267.
BARTHÉLEMY, J.P. and MONJARDET, B. (1988): The median procedure in data analysis: new results and open problems. In: H. H. Bock, editor, Classification and related methods of data analysis. North-Holland, Amsterdam, 309-316.
BOORMAN, S. A. and ARABIE, P. (1972): Structural measures and the method of sorting. In R. N. Shepard, A. K. Romney and S. B. Nerlove, editors, Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, 1: Theory. Seminar Press, New York, 225-249.
CHARON, I., DENOEUD, L., GUENOCHE, A. and HUDRY, O. (2006): Maximum transfer distance between partitions. Journal of Classification, 23(1), 103-121.
DAY, W. H. E. (1981): The complexity of computing metric distances between partitions. Mathematical Social Sciences, 1, 269-287.
DIMITRIADOU, E., WEINGESSEL, A. and HORNIK, K. (2002): A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16 (7),901-912.
GAUL, W. and SCHADER, M. (1988): Clusterwise aggregation of relations. Applied Stochastic Models and Data Analysis, 4, 273-282.
GORDON, A. D. and VICHI, M. (1998): Partitions of partitions. Journal of Classification, 15,265-285.
GORDON, A. D. and VICHI, M. (2001): Fuzzy partition models for fitting a set of partitions. Psychometrika, 66(2), 229-248.
GUSFIELD, D. (2002): Partition-distance: A problem and class of perfect graphs arising in clustering. Information Processing Letters, 82, 159-164.
HORNIK, K. (2005a): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14 (12). URL http://www.jstatsoft.org/v14/i12/.
HORNIK, K. (2005b): Cluster ensembles. In C. Weihs and W. Gaul, editors, Classifi-cation - The Ubiquitous Challenge. Proceedings of the 28th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Dortmund, March 9-11, 2004. Springer-Verlag, Heidelberg, 65-72.
HORNIK, K. (2007a): clue: Cluster Ensembles. R package version 0.3-12.
HORNIK, K. (2007b): On maximal euclidean partition dissimilarity. Under preparation.
HORNIK, K. and BÖHM, W. (2007): Alternating optimization algorithms for Euclidean and Manhattan consensus partitions. Under preparation.
MIRKIN, B.G. (1974): The problem of approximation in space of relations and qualitative data analysis. Automatika y Telemechanika, translated in: Information and Remote Con-trol, 35, 1424-1438.
PAPADIMITRIOU, C. and STEIGLITZ, K. (1982): Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, Englewood Cliffs.
ROSENBERG, S. (1982): The method of sorting in multivariate research with applications selected from cognitive psychology and person perception. In N. Hirschberg and L. G. Humphreys, editors, Multivariate Applications in the Social Sciences. Erlbaum, Hills-dale, New Jersey, 117-142.
ROSENBERG, S. and KIM, M. P. (1975): The method of sorting as a data-gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.
RUBIN, J. (1967): Optimal classification into groups: An approach for solving the taxonomy problem. Journal of Theoretical Biology, 15, 103-144.
WAKABAYASHI, Y. (1998): The complexity of computing median relations. Resenhas do Instituto de Mathematica ed Estadistica, Universidade de Sao Paolo, 3/3, 323-349.
ZHOU, D., LI, J. and ZHA, H. (2005): A new Mallows distance based metric for comparing clusterings. In ICML ’05: Proceedings of the 22nd International Conference on Machine Learning. ISBN 1-59593-180-5. ACM Press, New York, NY, USA, 1028-1035.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hornik, K., Böhm, W. (2008). Hard and Soft Euclidean Consensus Partitions. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-78246-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)