Skip to main content

Advertisement

Log in

A novel fuzzy C-means algorithm to generate diverse and desirable cluster solutions used by genetic-based clustering ensemble algorithms

  • Regular Research Paper
  • Published:
Memetic Computing Aims and scope Submit manuscript

Abstract

One of the most significant discussions in the field of machine learning today is on the clustering ensemble. The clustering ensemble combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known for their high ability to solve optimization problems, especially the problem of the clustering ensemble. To date, despite the major contributions to find consensus cluster partitions with application of genetic algorithms, there has been little discussion on population initialization through generative mechanisms in genetic-based clustering ensemble algorithms as well as the production of cluster partitions with favorable fitness values in first phase clustering ensembles. In this paper, a threshold fuzzy C-means algorithm, named TFCM, is proposed to solve the problem of diversity of clustering, one of the most common problems in clustering ensembles. Moreover, TFCM is able to increase the fitness of cluster partitions, such that it improves performance of genetic-based clustering ensemble algorithms. The fitness average of cluster partitions generated by TFCM are evaluated by three different objective functions and compared against other clustering algorithms. In this paper, a simple genetic-based clustering ensemble algorithm, named SGCE, is proposed, in which cluster partitions generated by the TFCM and other clustering algorithms are used as the initial population used by the SGCE. The performance of the SGCE is evaluated and compared based on the different initial populations used. The experimental results based on eleven real world datasets demonstrate that TFCM improves the fitness of cluster partitions and that the performance of the SGCE is enhanced using initial populations generated by the TFCM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Azimi J, Abdoos M, Analoui M (2007) A new efficient approach in clustering ensembles. IDEAL LNCS 4881: 395–405

    Google Scholar 

  2. Azimi J, Mohammadi M, Movaghar A, Analoui M (2006) Clustering ensembles using genetic algorithm. In: IEEE the international workshop on computer architecture for machine perception and sensing, pp 119–123

  3. Baraldi A, Blonda P (1998) A survey of fuzzy clustering algorithms for pattern recognition—part I and II. IEEE Trans Syst Man Cybern Part B Cybern 29(6): 778–801

    Article  Google Scholar 

  4. Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

    MATH  Google Scholar 

  5. Bobrowski L, Bezdek J (1991) C-means clustering with the l 1 and l norms. IEEE Trans Syst Man Cybern 21(3): 545–554

    Article  MathSciNet  MATH  Google Scholar 

  6. Cannon R, Dave J, Bezdek J (1986) Efficient implementation of the fuzzy C-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 8: 248–255

    Article  MATH  Google Scholar 

  7. Cheng T, Goldgof D, Hall L (1998) Fast fuzzy clustering. Fuzzy Sets Syst 93: 49–56

    Article  MATH  Google Scholar 

  8. Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Elsevier, Amsterdam, ISBN 0-12-369531-7

  9. Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinform Oxf Univ 19(9): 1090–1099

    Article  Google Scholar 

  10. Dunn J (1974) A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J Cybern 3(3): 32–57

    MathSciNet  Google Scholar 

  11. El-Sonbaty Y, Ismail M (1998) Fuzzy clustering for symbolic data. IEEE Trans Fuzzy Syst 6(2): 195–204

    Article  Google Scholar 

  12. Eschrich S, Ke J, Hall L, Goldgof D (2003) Fast accurate fuzzy clustering through data reduction. IEEE Trans Fuzzy Syst 11(2): 262–270

    Article  Google Scholar 

  13. Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st international conference on machine learning, Canada

  14. Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11): 1411–1415

    Article  Google Scholar 

  15. Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4): 513–518

    Article  Google Scholar 

  16. Fred ALN (2001) Finding consistent cluster in data partitions. Springer, Berlin, pp 309–318

    Google Scholar 

  17. Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: Fourth conference on pattern recognition, IEEE Computer Society

  18. Fred ALN, Jain AK (2002) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 835–850

  19. Gablentz W, Koppen M (2000) Robust clustering by evolutionary computation. In: Proceedings of fifth online world conference soft computing in industrial applications (WSC5)

  20. Gath I, Geva A (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7): 773–781

    Article  Google Scholar 

  21. Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2009) A survey: clustering ensembles techniques. Int Conf Comput Electr Syst Sci Eng (CESSE) 38: 644–653

    Google Scholar 

  22. Gröll L, Jäkel J (2005) A new convergence proof of fuzzy C-means. IEEE Trans Fuzzy Syst 13(5): 717–720

    Article  Google Scholar 

  23. Hathaway R, Bezdek J, Hu Y (2000) Generalized fuzzy c-means clustering strategies using L p norm distances. IEEE Trans Fuzzy Syst 8(5): 576–582

    Article  Google Scholar 

  24. Hathaway R, Bezdek J (2001) Fuzzy C-means clustering of incomplete data. IEEE Trans Syst Man Cybern 31(5): 735–744

    Article  Google Scholar 

  25. Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley, New York, ISBN 0-471-45565-2

  26. Honda K, Ichihashi H (2005) Regularized linear fuzzy clustering and probabilistic PCA mixture models. IEEE Trans Fuzzy Syst 13(4): 508–516

    Article  Google Scholar 

  27. Hong Y, Kwong S (2008) To combine steady-state genetic algorithm and ensemble learning for data clustering. Pattern Recognit Lett Elsevier J 29(9): 1416–1423

    Article  Google Scholar 

  28. Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit Soc 41(9): 2742–2756

    Article  MATH  Google Scholar 

  29. Hong Y, Kwong S, Xiong H, Ren Q (2008) Data clustering using virtual population based incremental learning algorithm with similarity matrix encoding strategy. ACM, GECCO, Quebec, pp 471–473

    Google Scholar 

  30. Höppner F, Klawonn F, Kruse R (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New York

    MATH  Google Scholar 

  31. Höppner F, Klawonn F (2003) A contribution to convergence theory of fuzzy C-means and derivatives. IEEE Trans Fuzzy Syst 11(5): 682–694

    Article  Google Scholar 

  32. Hung M, Yang D (2001) An efficient fuzzy C-means clustering algorithm. In: Proceedings of IEEE international conference on data mining, pp 225–232

  33. Jain AK, Murty MN, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323

    Article  Google Scholar 

  34. Kellam P, Liu X, Martin N, Orengo C, Swift S, Tucker A (2001) Comparing, contrasting and combining clusters in viral gene expression data. In: Proceedings of 6th workshop on intelligent data analysis

  35. Kersten P (1997) Implementation issues in the fuzzy C-medians clustering algorithm. In: Proceedings of the 6th ieee international conference on fuzzy systems, vol 2, pp 957–962

  36. Kolen J, Hutcheson T (2002) Surnameucing the time complexity of the fuzzy C-means algorithm. IEEE Trans Fuzzy Syst 10(2): 263–267

    Article  Google Scholar 

  37. Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  38. Koza J (1994) Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge

    MATH  Google Scholar 

  39. Leski J (2003) Generalized weighted conditional fuzzy clustering. IEEE Trans Fuzzy Syst 11(6): 709–715

    Article  Google Scholar 

  40. Luo H, Jing F, Xie X (2006) Combining multiple clusterings using information theory based genetic algorithm. IEEE Int Conf Comput Intell Security 1: 84–89

    Article  Google Scholar 

  41. Michalewicz Z (1992) Genetic algorithms + data structures = evolution programs. Springer, New York

    MATH  Google Scholar 

  42. Minaei B, Topchy A, Punch WF (2004) Ensembles of partitions via data resampling. In: Proceeding of international conference on information technology, ITCC 04, Las Vegas

  43. Mohammadi M, Davoodi R, Rahmani A (2007) A genetic based clustering method. In: Proceeding of 12th annual international computer society of iran computer conference (CSICC)

  44. Mohammadi M, Nikanjam A, Rahmani A (2008) An evolutionary approach to clustering ensemble. IEEE four international conference on natural computation, pp 77–82

  45. Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern Part B Cybern 27(5): 787–795

    Article  Google Scholar 

  46. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 583–617

  47. Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: Proceeding of the third IEEE international conference on data mining

  48. Topchy A, Jain AK, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining. Michigan State University, Michigan

  49. Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12): 1866–1881

    Article  Google Scholar 

  50. Topchy A, Minaei Bidgoli B, Jain AK, Punch W (2004) Adaptive clustering ensembles. In: Proceedings of international conference on pattern recognition (ICPR), Cambridge, UK, pp 272–275

  51. Trauwaert E (1987) L 1 in fuzzy clustering. In: Dodge Y (ed) Statistical data analysis based on the L 1. Elsevier Science Publishers, Amsterdam, pp 417–426

    Google Scholar 

  52. Wong C, Chen C, Su M (2001) A novel algorithm for data clustering. Pattern Recognit 34: 425–442

    Article  MATH  Google Scholar 

  53. Xu R, Wunsch DC (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3): 645–678

    Article  Google Scholar 

  54. Xu R, Wunsch DC (2009) Clustering. In: IEEE press series on computational intelligence. Wiley, New York

    Google Scholar 

  55. Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8): 1279–1284

    Article  Google Scholar 

  56. Zadeh L (1965) Fuzzy sets. Inform Control 8(8): 338–353

    Article  MathSciNet  MATH  Google Scholar 

  57. Pacheco J (2005) A scatter search approach for the minimum sum-of-squares clustering problem. Comput Oper Res 32: 1325–1335

    Article  MathSciNet  MATH  Google Scholar 

  58. Sivanandam SN, Deepa SN (2008) Introduction to genetic algorithms. Springer, Berlin

    MATH  Google Scholar 

  59. MATLAB (2008) http://www.mathworks.de

  60. Blake CL, Merz CJ (1998) UCI repository of machine learning databases, University of California, Irvine

  61. Huijsmans DP, Sebe N (2001) Extended performance graphs for cluster retrieval. In: Proceedings of the computer society conference computer vision pattern recognition, IEEE Computer Society, vol 1, pp 1063–6919

  62. Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. Artif Neural Netw Eng 7: 809–814

    Google Scholar 

  63. Chen X, Ong YS, Lim MH, Tan KC (2011) A multi-facet survey on memetic computation. IEEE Trans Evol Comput 15(5): 591–607

    Article  Google Scholar 

  64. Ong YS, Lim MH, Zhu N, Wong KW (2006) Classification of adaptive memetic algorithms: a comparative study. IEEE Trans Syst Man Cybern Part B Cybern 36(1): 141–152

    Article  Google Scholar 

  65. Bosman PAN, De Jong ED (2006) Combining gradient techniques for numerical multi-objective evolutionary optimization. Proc Genet Evol Comput Conf 1: 627–634

    Google Scholar 

  66. Ong YS, Lim MH, Chen X (2010) Memetic computing–an overview. Res Front Art IEEE Comput Intell Mag 5(2): 24–36

    Article  Google Scholar 

  67. Burke E, Gustafson S, Kendall G, Krasnogor N (2002) Advanced population diversity measures in genetic programming. In: Proceedings of seventh PPSN, pp 341–350

  68. Neri F, Tirronen V, Karkkainen T, Rossi T (2007) Fitness diversity based adaptation in multimeme algorithms: a comparative study. IEEE Congr Evol Comput 36: 2374–2381

    Article  Google Scholar 

  69. Coello Coello C, Pulido G, Montes E (2005) Current and future research trends in evolutionary multiobjective optimization. In: Information processing with evolutionary algorithms (advanced information and knowledge processing). Springer, London, pp 213–231

  70. Neri F, Kotilainen N, Vapa M (2008) A memetic-neural approach to discover resources in P2P networks. In: Recent advances in evolutionary computation for combinatorial optimization, vol 153. Springer, Berlin, Germany, pp 113–129

  71. Tirronen V, Neri F, Karkkainen T, Majava K, Rossi T (2007) A memetic differential evolution in filter design for defect detection in paper production. In: Proceedings of EvoWorkshops EvoCoMnet EvoFIN EvoIASP EvoINTERACTION EvoMUSART EvoSTOC EvoTransLog: applications of evolutionary computing, pp 320–329

  72. Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2011) A review: accuracy optimization in clustering ensembles using genetic algorithms. Int J Artif Intell Rev 35(4): 287–318

    Article  Google Scholar 

  73. Attea BA (2010) A fuzzy multi-objective particle swarm optimization for effective data clustering. Springer, Berlin, pp 305–312

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Ghaemi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghaemi, R., Sulaiman, M.N., Ibrahim, H. et al. A novel fuzzy C-means algorithm to generate diverse and desirable cluster solutions used by genetic-based clustering ensemble algorithms. Memetic Comp. 4, 49–71 (2012). https://doi.org/10.1007/s12293-012-0073-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12293-012-0073-3

Keywords

Navigation