Skip to main content
Log in

Finding centroid clusterings with entropy-based criteria

  • Short Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We investigate the following problem: Given a set of candidate clusterings for a common set of objects, find a centroid clustering that is most compatible to the input set. First, we propose a series of entropy-based distance functions for comparing various clusterings. Such functions enable us to directly select the local centroid from the candidate set. Second, we present two combining methods for the global centroid. The selected/combined centroid clustering is likely to be a good choice, i.e., top or middle ranked in terms of closeness to the true clustering. Finally, we evaluate their effectiveness on both artificial and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Dietterich TG (2001) Ensemble methods in machine learning. In: Proceedings of the 2nd international workshop on multiple classifier systems, pp 1–15

  2. Fayyad UM, Reina C, Bradley PS (1998) Initialization of iterative refinement clustering algorithms. In: Proceedings of the 14th international conference on machine learning, pp 194–198

  3. Fisher D (1996) Iterative optimization and simplification of hierarchical clusterings. J Artif Intell Res 4:147–180

    MATH  Google Scholar 

  4. Fred ALN, Jain AK (2002) Evidence accumulation clustering based on the k -means algorithm. In: Proceedings of the joint IAPR international workshops on structural, syntactic, and statistical pattern recognition, pp 442–451

  5. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, pp 148–156

  6. Frossyniotis D, Likas A, Stafylopatis A (2004) A clustering method based on boosting. Pattern Recog Lett 25(6):641–654

    Article  Google Scholar 

  7. Ghosh J (2002) Multiclassifier systems: back to the future. In: Proceedings of the 3rd international workshop on multiple classifier systems, pp 1–15

  8. Gordon A (1999) Classification, 2nd edn. Chapman and Hall/CRC Press, Boca Raton

    MATH  Google Scholar 

  9. Grabmeier J, Rudolph A (2002) Techniques of cluster algorithms in data mining. Data Min Knowl Discov 6(4):303–360

    Article  MathSciNet  Google Scholar 

  10. Hubert LJ, Arabie P (1985) Comparing partitions. J Class 2:63–76

    Article  Google Scholar 

  11. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  12. Johnson E, Kargupta H (1999) Collective, hierarchical clustering from distributed, heterogeneous data. In: Large-scale parallel KDD systems. Springer-Verlag, Berlin Heidelberg New York, pp 221–244

  13. Kargupta H, Huang W, Johnson E (2001) Distributed clustering using collective principal component analysis. Knowl Inform Syst J 3:422–448

    Article  MATH  Google Scholar 

  14. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  Google Scholar 

  15. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Google Scholar 

  16. Ross S (1998) A first course in probability, 5th edn. Prentice-Hall, Engelwood Cliffs

    MATH  Google Scholar 

  17. Schapire R (1990) The strength of weak learnability. Mach Learn 5(2):197–227

    Google Scholar 

  18. Sharkey A (1999) Combining artificial neural nets. Springer-Verlag, Berlin Heidelberg New York

    MATH  Google Scholar 

  19. Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianming Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, T., Sung, S.Y. Finding centroid clusterings with entropy-based criteria. Knowl Inf Syst 10, 505–514 (2006). https://doi.org/10.1007/s10115-006-0017-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0017-7

Keywords

Navigation