Abstract
In this paper, we introduce the weighted mean of a pair of clusterings. Given two clusterings C 1 and C 2, the weighted mean of C 1 and C 2 is a clustering C w that has distances d(C 1, C w ) and d(C w , C 2) to C 1 and C 2, respectively, such that d(C 1, C w ) + d(C w , C 2) = d(C 1, C 2) holds for some clustering distance function d. C w is defined such that the sum of the distances d(C 1, C w ) and d(C w , C 2) is equal to the distance between C 1 and C 2. An algorithm for its computation will be presented. Experimental results on both synthetic and real data will be shown to illustrate the usefulness of the weighted mean concept. In particular, it gives a tool for the cluster ensemble techniques.
References
Almudevar A, Field C (1999) Estimation of single generation sibling relationships based on DNA markers. J Agric Biol Environ Stat 4:136–165
Armstrong SA, Staunton JE, Silverman LB, Pieters R, Boer D, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47
Bunke H, Günter S (2001) Weighted mean of a pair of graphs. Comput Lett 67(3): 209–224
Bunke H, Günter S, Jiang (2001) X Towards bridging the gap between statistical and structural pattern recognition: two new concepts in graph matching. In: Proceedings of the International Conference on Advances in Pattern Recognition, pp 1–11
Bunke H, Jiang X, Abegglen K, Kandel A (2002) On the weighted mean of a pair of strings. Pattern Anal Appl 5(1):23–30
Burkard R, Dell’Amico M, Martello S (2009) Assignment problems. Society for Industrial and Applied Mathematics, Philadelphia
Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315 – 332
Ferrer M, Valveny E, Serratosa F, Riesen K, Bunke H (2010) Generalized median graph computation by means of graph embedding in vector spaces. Pattern Recognit 43(4):1642–1655
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41(1):176–190
Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383):553–569
Gupta G, Liu E, Ghosh J (2008) Automated hierarchical density shaving: a robust, automated clustering and visualization framework for large biological datasets. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11 Mar, IEEE Computer Society Digital Library IEEE Computer Society
Gusfield D (2002) Partition-distance: a problem and class of perfect graphs arising in clustering. Inf Process Lett 82(3):159–164
Jiang X, Münger A, Bunke H (2001) On median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151
Jiang X, Bunke H, Abegglen K, Kandel A (2002) Curve morphing by weighted mean of strings. In: Proceedings of the International Conference on Pattern Recognition, pp 192–192
Jiang X, Abegglen K, Bunke H, Csirik J (2003) Dynamic computation of generalized median strings. Pattern Anal Appl 6(3):185–193
Jiang X, Bunke H (2010) Learning by generalized median concept. In: Wang P (ed) Pattern recognition and machine vision. River Publishers, Aalborg, pp 1–16
Jiang X, Wentker J, Ferrer M (2012) Generalized median string computation by means of string embedding in vector spaces. Pattern Recognit Lett 33(7):842–852
Konovalov DA, Litow B, Bajema N (2005) Partition-distance via the assignment problem. Bioinform Biol Insights 21(10): 2463–2468
Liu Y, Ouyang Y, Xiong Z (2011) Incremental clustering using information bottleneck theory. Int J Pattern Recognit Artif Intell 25(5):695–712
Ma EW, Chow TW (2004) A new shifting grid clustering algorithm. Pattern Recognit 37(3):503 – 514
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In Cam LML, Neyman J (eds) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol 1, pp 281–297. University of California Press
Meila M (2003) Comparing clusterings by the variation of information. In 16th Annual Conference on Computational Learning Theory, pp 173–187
Meilă M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42:9–29
Mirkin B (1996) Mathematical classification and clustering. Kluwer Academic Press, Boston
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1): 32–38
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850
Shieh H-L, Kuo C-C (2011) A novel validity index for the subtractive clustering algorithm. Int J Pattern Recognit Artif Intell 25(4):547–563
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Van Dongen S (2000) Performance criteria for graph clustering and markov cluster experiments. Technical report, National Research Institute for Mathematics and Computer Science, Amsterdam
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(3):337–372
Wang X-F, Huang D-S (2009) A novel density-based clustering framework by using level set method. IEEE Trans Knowl Data Eng 21:1515–1531
Xu R, Wunsch I (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In Advances in Neural Information Processing Systems, vol 17, pp 1601–1608. MIT Press
Zhou D, Li J, Zha H (2005) A new mallows distance based metric for comparing clusterings. In Proceedings of the International Conference on Machine Learning, pp1028–1035
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Franek, L., Jiang, X. & He, C. Weighted mean of a pair of clusterings. Pattern Anal Applic 17, 153–166 (2014). https://doi.org/10.1007/s10044-012-0304-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-012-0304-8