Abstract
We investigate the following problem: Given a set of candidate clusterings for a common set of objects, find a centroid clustering that is most compatible to the input set. First, we propose a series of entropy-based distance functions for comparing various clusterings. Such functions enable us to directly select the local centroid from the candidate set. Second, we present two combining methods for the global centroid. The selected/combined centroid clustering is likely to be a good choice, i.e., top or middle ranked in terms of closeness to the true clustering. Finally, we evaluate their effectiveness on both artificial and real data sets.
Similar content being viewed by others
References
Dietterich TG (2001) Ensemble methods in machine learning. In: Proceedings of the 2nd international workshop on multiple classifier systems, pp 1–15
Fayyad UM, Reina C, Bradley PS (1998) Initialization of iterative refinement clustering algorithms. In: Proceedings of the 14th international conference on machine learning, pp 194–198
Fisher D (1996) Iterative optimization and simplification of hierarchical clusterings. J Artif Intell Res 4:147–180
Fred ALN, Jain AK (2002) Evidence accumulation clustering based on the k -means algorithm. In: Proceedings of the joint IAPR international workshops on structural, syntactic, and statistical pattern recognition, pp 442–451
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, pp 148–156
Frossyniotis D, Likas A, Stafylopatis A (2004) A clustering method based on boosting. Pattern Recog Lett 25(6):641–654
Ghosh J (2002) Multiclassifier systems: back to the future. In: Proceedings of the 3rd international workshop on multiple classifier systems, pp 1–15
Gordon A (1999) Classification, 2nd edn. Chapman and Hall/CRC Press, Boca Raton
Grabmeier J, Rudolph A (2002) Techniques of cluster algorithms in data mining. Data Min Knowl Discov 6(4):303–360
Hubert LJ, Arabie P (1985) Comparing partitions. J Class 2:63–76
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Johnson E, Kargupta H (1999) Collective, hierarchical clustering from distributed, heterogeneous data. In: Large-scale parallel KDD systems. Springer-Verlag, Berlin Heidelberg New York, pp 221–244
Kargupta H, Huang W, Johnson E (2001) Distributed clustering using collective principal component analysis. Knowl Inform Syst J 3:422–448
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Ross S (1998) A first course in probability, 5th edn. Prentice-Hall, Engelwood Cliffs
Schapire R (1990) The strength of weak learnability. Mach Learn 5(2):197–227
Sharkey A (1999) Combining artificial neural nets. Springer-Verlag, Berlin Heidelberg New York
Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, T., Sung, S.Y. Finding centroid clusterings with entropy-based criteria. Knowl Inf Syst 10, 505–514 (2006). https://doi.org/10.1007/s10115-006-0017-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0017-7