Hard and Soft Euclidean Consensus Partitions

Hornik, Kurt; Böhm, Walter

doi:10.1007/978-3-540-78246-9_18

Hard and Soft Euclidean Consensus Partitions

Kurt Hornik⁵ &
Walter Böhm⁵

Conference paper

6017 Accesses
3 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Euclidean partition dissimilarity d(P, P∼) (Dimitriadou et al., 2002) is defined as the square root of the minimal sum of squared differences of the class membership values of the partitions P and P∼, with the minimum taken over all matchings between the classes of the partitions. We first discuss some theoretical properties of this dissimilarity measure. Then, we look at the Euclidean consensus problem for partition ensembles, i.e., the problem to find a hard or soft partition P with a given number of classes which minimizes the (possibly weighted) sum Σb ^w _b ^d(P _b,P)² of squared Euclidean dissimilarities d between P and the elements P _b, of the ensemble. This is an NP-hard problem, and related to consensus problems studied in Gordon and Vichi (2001). We present an efficient “Alternating Optimization” (AO) heuristic for finding P, which iterates between optimally rematching classes for fixed memberships, and optimizing class memberships for fixed matchings. An implementation of such AO algorithms for consensus partitions is available in the R extension package clue. We illustrate this algorithm on two data sets (the popular Rosenberg-Kim kinship terms data and a macroeconomic one) employed by Gordon & Vichi.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BARTHÉLEMY, J.P. and MONJARDET, B. (1981): The median procedure in cluster analysis and social choice theory. Mathematical Social Sciences, 1, 235-267.
Article MATH Google Scholar
BARTHÉLEMY, J.P. and MONJARDET, B. (1988): The median procedure in data analysis: new results and open problems. In: H. H. Bock, editor, Classification and related methods of data analysis. North-Holland, Amsterdam, 309-316.
Google Scholar
BOORMAN, S. A. and ARABIE, P. (1972): Structural measures and the method of sorting. In R. N. Shepard, A. K. Romney and S. B. Nerlove, editors, Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, 1: Theory. Seminar Press, New York, 225-249.
Google Scholar
CHARON, I., DENOEUD, L., GUENOCHE, A. and HUDRY, O. (2006): Maximum transfer distance between partitions. Journal of Classification, 23(1), 103-121.
Article MathSciNet Google Scholar
DAY, W. H. E. (1981): The complexity of computing metric distances between partitions. Mathematical Social Sciences, 1, 269-287.
Article MATH Google Scholar
DIMITRIADOU, E., WEINGESSEL, A. and HORNIK, K. (2002): A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16 (7),901-912.
Article Google Scholar
GAUL, W. and SCHADER, M. (1988): Clusterwise aggregation of relations. Applied Stochastic Models and Data Analysis, 4, 273-282.
Article MathSciNet Google Scholar
GORDON, A. D. and VICHI, M. (1998): Partitions of partitions. Journal of Classification, 15,265-285.
Article MATH Google Scholar
GORDON, A. D. and VICHI, M. (2001): Fuzzy partition models for fitting a set of partitions. Psychometrika, 66(2), 229-248.
Article MathSciNet Google Scholar
GUSFIELD, D. (2002): Partition-distance: A problem and class of perfect graphs arising in clustering. Information Processing Letters, 82, 159-164.
Article MATH MathSciNet Google Scholar
HORNIK, K. (2005a): A CLUE for CLUster Ensembles. Journal of Statistical Software, 14 (12). URL http://www.jstatsoft.org/v14/i12/.
HORNIK, K. (2005b): Cluster ensembles. In C. Weihs and W. Gaul, editors, Classifi-cation - The Ubiquitous Challenge. Proceedings of the 28th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Dortmund, March 9-11, 2004. Springer-Verlag, Heidelberg, 65-72.
Google Scholar
HORNIK, K. (2007a): clue: Cluster Ensembles. R package version 0.3-12.
Google Scholar
HORNIK, K. (2007b): On maximal euclidean partition dissimilarity. Under preparation.
Google Scholar
HORNIK, K. and BÖHM, W. (2007): Alternating optimization algorithms for Euclidean and Manhattan consensus partitions. Under preparation.
Google Scholar
MIRKIN, B.G. (1974): The problem of approximation in space of relations and qualitative data analysis. Automatika y Telemechanika, translated in: Information and Remote Con-trol, 35, 1424-1438.
Google Scholar
PAPADIMITRIOU, C. and STEIGLITZ, K. (1982): Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, Englewood Cliffs.
MATH Google Scholar
ROSENBERG, S. (1982): The method of sorting in multivariate research with applications selected from cognitive psychology and person perception. In N. Hirschberg and L. G. Humphreys, editors, Multivariate Applications in the Social Sciences. Erlbaum, Hills-dale, New Jersey, 117-142.
Google Scholar
ROSENBERG, S. and KIM, M. P. (1975): The method of sorting as a data-gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.
Article Google Scholar
RUBIN, J. (1967): Optimal classification into groups: An approach for solving the taxonomy problem. Journal of Theoretical Biology, 15, 103-144.
Article Google Scholar
WAKABAYASHI, Y. (1998): The complexity of computing median relations. Resenhas do Instituto de Mathematica ed Estadistica, Universidade de Sao Paolo, 3/3, 323-349.
MATH MathSciNet Google Scholar
ZHOU, D., LI, J. and ZHA, H. (2005): A new Mallows distance based metric for comparing clusterings. In ICML ’05: Proceedings of the 22nd International Conference on Machine Learning. ISBN 1-59593-180-5. ACM Press, New York, NY, USA, 1028-1035.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, A-1090, Wien, Austria
Kurt Hornik & Walter Böhm

Authors

Kurt Hornik
View author publications
You can also search for this author in PubMed Google Scholar
Walter Böhm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Institute of Business Economics and Information Systems, University of Hildesheim, Marienburgerplatz 22, 31141, Hildesheim, Germany
Christine Preisach
Lehrstuhl für Mustererkennung und Bildverarbeitung, Universität Freiburg, Gebäude 052, 79110, Freiburg i. Br, Germany
Hans Burkhardt
Institute of Computer Science and Institute of Business Economics and Information Systems, Marienburgerplatz 22, 31141, Hildesheim, Germany
Lars Schmidt-Thieme
Fakultät für Wirtschaftswissenschaften, Lehrstuhl für Betriebswirtschaftslehre, insbes. Marketing, Universitätsstraße 25, 33615, Bielefeld, Germany
Reinhold Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hornik, K., Böhm, W. (2008). Hard and Soft Euclidean Consensus Partitions. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds) Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78246-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-78246-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78239-1
Online ISBN: 978-3-540-78246-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics