Abstract
One of the most significant discussions in the field of machine learning today is on the clustering ensemble. The clustering ensemble combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known for their high ability to solve optimization problems, especially the problem of the clustering ensemble. To date, despite the major contributions to find consensus cluster partitions with application of genetic algorithms, there has been little discussion on population initialization through generative mechanisms in genetic-based clustering ensemble algorithms as well as the production of cluster partitions with favorable fitness values in first phase clustering ensembles. In this paper, a threshold fuzzy C-means algorithm, named TFCM, is proposed to solve the problem of diversity of clustering, one of the most common problems in clustering ensembles. Moreover, TFCM is able to increase the fitness of cluster partitions, such that it improves performance of genetic-based clustering ensemble algorithms. The fitness average of cluster partitions generated by TFCM are evaluated by three different objective functions and compared against other clustering algorithms. In this paper, a simple genetic-based clustering ensemble algorithm, named SGCE, is proposed, in which cluster partitions generated by the TFCM and other clustering algorithms are used as the initial population used by the SGCE. The performance of the SGCE is evaluated and compared based on the different initial populations used. The experimental results based on eleven real world datasets demonstrate that TFCM improves the fitness of cluster partitions and that the performance of the SGCE is enhanced using initial populations generated by the TFCM.
Similar content being viewed by others
References
Azimi J, Abdoos M, Analoui M (2007) A new efficient approach in clustering ensembles. IDEAL LNCS 4881: 395–405
Azimi J, Mohammadi M, Movaghar A, Analoui M (2006) Clustering ensembles using genetic algorithm. In: IEEE the international workshop on computer architecture for machine perception and sensing, pp 119–123
Baraldi A, Blonda P (1998) A survey of fuzzy clustering algorithms for pattern recognition—part I and II. IEEE Trans Syst Man Cybern Part B Cybern 29(6): 778–801
Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Bobrowski L, Bezdek J (1991) C-means clustering with the l 1 and l ∞ norms. IEEE Trans Syst Man Cybern 21(3): 545–554
Cannon R, Dave J, Bezdek J (1986) Efficient implementation of the fuzzy C-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 8: 248–255
Cheng T, Goldgof D, Hall L (1998) Fast fuzzy clustering. Fuzzy Sets Syst 93: 49–56
Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Elsevier, Amsterdam, ISBN 0-12-369531-7
Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinform Oxf Univ 19(9): 1090–1099
Dunn J (1974) A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J Cybern 3(3): 32–57
El-Sonbaty Y, Ismail M (1998) Fuzzy clustering for symbolic data. IEEE Trans Fuzzy Syst 6(2): 195–204
Eschrich S, Ke J, Hall L, Goldgof D (2003) Fast accurate fuzzy clustering through data reduction. IEEE Trans Fuzzy Syst 11(2): 262–270
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st international conference on machine learning, Canada
Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11): 1411–1415
Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4): 513–518
Fred ALN (2001) Finding consistent cluster in data partitions. Springer, Berlin, pp 309–318
Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: Fourth conference on pattern recognition, IEEE Computer Society
Fred ALN, Jain AK (2002) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 835–850
Gablentz W, Koppen M (2000) Robust clustering by evolutionary computation. In: Proceedings of fifth online world conference soft computing in industrial applications (WSC5)
Gath I, Geva A (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7): 773–781
Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2009) A survey: clustering ensembles techniques. Int Conf Comput Electr Syst Sci Eng (CESSE) 38: 644–653
Gröll L, Jäkel J (2005) A new convergence proof of fuzzy C-means. IEEE Trans Fuzzy Syst 13(5): 717–720
Hathaway R, Bezdek J, Hu Y (2000) Generalized fuzzy c-means clustering strategies using L p norm distances. IEEE Trans Fuzzy Syst 8(5): 576–582
Hathaway R, Bezdek J (2001) Fuzzy C-means clustering of incomplete data. IEEE Trans Syst Man Cybern 31(5): 735–744
Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley, New York, ISBN 0-471-45565-2
Honda K, Ichihashi H (2005) Regularized linear fuzzy clustering and probabilistic PCA mixture models. IEEE Trans Fuzzy Syst 13(4): 508–516
Hong Y, Kwong S (2008) To combine steady-state genetic algorithm and ensemble learning for data clustering. Pattern Recognit Lett Elsevier J 29(9): 1416–1423
Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit Soc 41(9): 2742–2756
Hong Y, Kwong S, Xiong H, Ren Q (2008) Data clustering using virtual population based incremental learning algorithm with similarity matrix encoding strategy. ACM, GECCO, Quebec, pp 471–473
Höppner F, Klawonn F, Kruse R (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New York
Höppner F, Klawonn F (2003) A contribution to convergence theory of fuzzy C-means and derivatives. IEEE Trans Fuzzy Syst 11(5): 682–694
Hung M, Yang D (2001) An efficient fuzzy C-means clustering algorithm. In: Proceedings of IEEE international conference on data mining, pp 225–232
Jain AK, Murty MN, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323
Kellam P, Liu X, Martin N, Orengo C, Swift S, Tucker A (2001) Comparing, contrasting and combining clusters in viral gene expression data. In: Proceedings of 6th workshop on intelligent data analysis
Kersten P (1997) Implementation issues in the fuzzy C-medians clustering algorithm. In: Proceedings of the 6th ieee international conference on fuzzy systems, vol 2, pp 957–962
Kolen J, Hutcheson T (2002) Surnameucing the time complexity of the fuzzy C-means algorithm. IEEE Trans Fuzzy Syst 10(2): 263–267
Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Koza J (1994) Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge
Leski J (2003) Generalized weighted conditional fuzzy clustering. IEEE Trans Fuzzy Syst 11(6): 709–715
Luo H, Jing F, Xie X (2006) Combining multiple clusterings using information theory based genetic algorithm. IEEE Int Conf Comput Intell Security 1: 84–89
Michalewicz Z (1992) Genetic algorithms + data structures = evolution programs. Springer, New York
Minaei B, Topchy A, Punch WF (2004) Ensembles of partitions via data resampling. In: Proceeding of international conference on information technology, ITCC 04, Las Vegas
Mohammadi M, Davoodi R, Rahmani A (2007) A genetic based clustering method. In: Proceeding of 12th annual international computer society of iran computer conference (CSICC)
Mohammadi M, Nikanjam A, Rahmani A (2008) An evolutionary approach to clustering ensemble. IEEE four international conference on natural computation, pp 77–82
Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern Part B Cybern 27(5): 787–795
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 583–617
Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: Proceeding of the third IEEE international conference on data mining
Topchy A, Jain AK, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining. Michigan State University, Michigan
Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12): 1866–1881
Topchy A, Minaei Bidgoli B, Jain AK, Punch W (2004) Adaptive clustering ensembles. In: Proceedings of international conference on pattern recognition (ICPR), Cambridge, UK, pp 272–275
Trauwaert E (1987) L 1 in fuzzy clustering. In: Dodge Y (ed) Statistical data analysis based on the L 1. Elsevier Science Publishers, Amsterdam, pp 417–426
Wong C, Chen C, Su M (2001) A novel algorithm for data clustering. Pattern Recognit 34: 425–442
Xu R, Wunsch DC (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3): 645–678
Xu R, Wunsch DC (2009) Clustering. In: IEEE press series on computational intelligence. Wiley, New York
Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8): 1279–1284
Zadeh L (1965) Fuzzy sets. Inform Control 8(8): 338–353
Pacheco J (2005) A scatter search approach for the minimum sum-of-squares clustering problem. Comput Oper Res 32: 1325–1335
Sivanandam SN, Deepa SN (2008) Introduction to genetic algorithms. Springer, Berlin
MATLAB (2008) http://www.mathworks.de
Blake CL, Merz CJ (1998) UCI repository of machine learning databases, University of California, Irvine
Huijsmans DP, Sebe N (2001) Extended performance graphs for cluster retrieval. In: Proceedings of the computer society conference computer vision pattern recognition, IEEE Computer Society, vol 1, pp 1063–6919
Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. Artif Neural Netw Eng 7: 809–814
Chen X, Ong YS, Lim MH, Tan KC (2011) A multi-facet survey on memetic computation. IEEE Trans Evol Comput 15(5): 591–607
Ong YS, Lim MH, Zhu N, Wong KW (2006) Classification of adaptive memetic algorithms: a comparative study. IEEE Trans Syst Man Cybern Part B Cybern 36(1): 141–152
Bosman PAN, De Jong ED (2006) Combining gradient techniques for numerical multi-objective evolutionary optimization. Proc Genet Evol Comput Conf 1: 627–634
Ong YS, Lim MH, Chen X (2010) Memetic computing–an overview. Res Front Art IEEE Comput Intell Mag 5(2): 24–36
Burke E, Gustafson S, Kendall G, Krasnogor N (2002) Advanced population diversity measures in genetic programming. In: Proceedings of seventh PPSN, pp 341–350
Neri F, Tirronen V, Karkkainen T, Rossi T (2007) Fitness diversity based adaptation in multimeme algorithms: a comparative study. IEEE Congr Evol Comput 36: 2374–2381
Coello Coello C, Pulido G, Montes E (2005) Current and future research trends in evolutionary multiobjective optimization. In: Information processing with evolutionary algorithms (advanced information and knowledge processing). Springer, London, pp 213–231
Neri F, Kotilainen N, Vapa M (2008) A memetic-neural approach to discover resources in P2P networks. In: Recent advances in evolutionary computation for combinatorial optimization, vol 153. Springer, Berlin, Germany, pp 113–129
Tirronen V, Neri F, Karkkainen T, Majava K, Rossi T (2007) A memetic differential evolution in filter design for defect detection in paper production. In: Proceedings of EvoWorkshops EvoCoMnet EvoFIN EvoIASP EvoINTERACTION EvoMUSART EvoSTOC EvoTransLog: applications of evolutionary computing, pp 320–329
Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2011) A review: accuracy optimization in clustering ensembles using genetic algorithms. Int J Artif Intell Rev 35(4): 287–318
Attea BA (2010) A fuzzy multi-objective particle swarm optimization for effective data clustering. Springer, Berlin, pp 305–312
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghaemi, R., Sulaiman, M.N., Ibrahim, H. et al. A novel fuzzy C-means algorithm to generate diverse and desirable cluster solutions used by genetic-based clustering ensemble algorithms. Memetic Comp. 4, 49–71 (2012). https://doi.org/10.1007/s12293-012-0073-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12293-012-0073-3