Skip to main content
Log in

Weighted mean of a pair of clusterings

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this paper, we introduce the weighted mean of a pair of clusterings. Given two clusterings C 1 and C 2, the weighted mean of C 1 and C 2 is a clustering C w that has distances d(C 1, C w ) and d(C w , C 2) to C 1 and C 2, respectively, such that d(C 1C w ) + d(C w C 2) = d(C 1C 2) holds for some clustering distance function d. C w is defined such that the sum of the distances d(C 1, C w ) and d(C w , C 2) is equal to the distance between C 1 and C 2. An algorithm for its computation will be presented. Experimental results on both synthetic and real data will be shown to illustrate the usefulness of the weighted mean concept. In particular, it gives a tool for the cluster ensemble techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  1. Almudevar A, Field C (1999) Estimation of single generation sibling relationships based on DNA markers. J Agric Biol Environ Stat 4:136–165

    Article  MathSciNet  Google Scholar 

  2. Armstrong SA, Staunton JE, Silverman LB, Pieters R, Boer D, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47

    Article  Google Scholar 

  3. Bunke H, Günter S (2001) Weighted mean of a pair of graphs. Comput Lett 67(3): 209–224

    Article  MATH  MathSciNet  Google Scholar 

  4. Bunke H, Günter S, Jiang (2001) X Towards bridging the gap between statistical and structural pattern recognition: two new concepts in graph matching. In: Proceedings of the International Conference on Advances in Pattern Recognition, pp 1–11

  5. Bunke H, Jiang X, Abegglen K, Kandel A (2002) On the weighted mean of a pair of strings. Pattern Anal Appl 5(1):23–30

    Article  MathSciNet  Google Scholar 

  6. Burkard R, Dell’Amico M, Martello S (2009) Assignment problems. Society for Industrial and Applied Mathematics, Philadelphia

  7. Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315 – 332

    Article  MATH  MathSciNet  Google Scholar 

  8. Ferrer M, Valveny E, Serratosa F, Riesen K, Bunke H (2010) Generalized median graph computation by means of graph embedding in vector spaces. Pattern Recognit 43(4):1642–1655

    Article  MATH  Google Scholar 

  9. Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41(1):176–190

    Article  MATH  Google Scholar 

  10. Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383):553–569

    Article  MATH  Google Scholar 

  11. Gupta G, Liu E, Ghosh J (2008) Automated hierarchical density shaving: a robust, automated clustering and visualization framework for large biological datasets. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 11 Mar, IEEE Computer Society Digital Library IEEE Computer Society

  12. Gusfield D (2002) Partition-distance: a problem and class of perfect graphs arising in clustering. Inf Process Lett 82(3):159–164

    Article  MATH  MathSciNet  Google Scholar 

  13. Jiang X, Münger A, Bunke H (2001) On median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Mach Intell 23(10):1144–1151

    Article  Google Scholar 

  14. Jiang X, Bunke H, Abegglen K, Kandel A (2002) Curve morphing by weighted mean of strings. In: Proceedings of the International Conference on Pattern Recognition, pp 192–192

  15. Jiang X, Abegglen K, Bunke H, Csirik J (2003) Dynamic computation of generalized median strings. Pattern Anal Appl 6(3):185–193

    Article  MathSciNet  Google Scholar 

  16. Jiang X, Bunke H (2010) Learning by generalized median concept. In: Wang P (ed) Pattern recognition and machine vision. River Publishers, Aalborg, pp 1–16

  17. Jiang X, Wentker J, Ferrer M (2012) Generalized median string computation by means of string embedding in vector spaces. Pattern Recognit Lett 33(7):842–852

    Google Scholar 

  18. Konovalov DA, Litow B, Bajema N (2005) Partition-distance via the assignment problem. Bioinform Biol Insights 21(10): 2463–2468

    Article  Google Scholar 

  19. Liu Y, Ouyang Y, Xiong Z (2011) Incremental clustering using information bottleneck theory. Int J Pattern Recognit Artif Intell 25(5):695–712

    Article  MathSciNet  Google Scholar 

  20. Ma EW, Chow TW (2004) A new shifting grid clustering algorithm. Pattern Recognit 37(3):503 – 514

    Article  MATH  Google Scholar 

  21. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In Cam LML, Neyman J (eds) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol 1, pp 281–297. University of California Press

  22. Meila M (2003) Comparing clusterings by the variation of information. In 16th Annual Conference on Computational Learning Theory, pp 173–187

  23. Meilă M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42:9–29

    Article  MATH  Google Scholar 

  24. Mirkin B (1996) Mathematical classification and clustering. Kluwer Academic Press, Boston

  25. Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1): 32–38

    Article  MATH  MathSciNet  Google Scholar 

  26. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850

    Article  Google Scholar 

  27. Shieh H-L, Kuo C-C (2011) A novel validity index for the subtractive clustering algorithm. Int J Pattern Recognit Artif Intell 25(4):547–563

    Article  MathSciNet  Google Scholar 

  28. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  Google Scholar 

  29. Van Dongen S (2000) Performance criteria for graph clustering and markov cluster experiments. Technical report, National Research Institute for Mathematics and Computer Science, Amsterdam

  30. Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(3):337–372

    Article  MathSciNet  Google Scholar 

  31. Wang X-F, Huang D-S (2009) A novel density-based clustering framework by using level set method. IEEE Trans Knowl Data Eng 21:1515–1531

    Article  Google Scholar 

  32. Xu R, Wunsch I (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  33. Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In Advances in Neural Information Processing Systems, vol 17, pp 1601–1608. MIT Press

  34. Zhou D, Li J, Zha H (2005) A new mallows distance based metric for comparing clusterings. In Proceedings of the International Conference on Machine Learning, pp1028–1035

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas Franek.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franek, L., Jiang, X. & He, C. Weighted mean of a pair of clusterings. Pattern Anal Applic 17, 153–166 (2014). https://doi.org/10.1007/s10044-012-0304-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-012-0304-8

keywords

Navigation