Abstract
Partitional clustering is a common approach to cluster analysis. Although many algorithms have been proposed, partitional clustering remains a challenging problem with respect to the reliability and efficiency of recovering high quality solutions in terms of its criterion functions. In this paper, we propose a niching genetic k-means algorithm (NGKA) for partitional clustering, which aims at reliably and efficiently identifying high quality solutions in terms of the sum of squared errors criterion. Within the NGKA, we design a niching method, which encourages mating among similar clustering solutions while allowing for some competitions among dissimilar solutions, and integrate it into a genetic algorithm to prevent premature convergence during the evolutionary clustering search. Further, we incorporate one step of k-means operation into the regeneration steps of the resulted niching genetic algorithm to improve its computational efficiency. The proposed algorithm was applied to cluster both simulated data and gene expression data and compared with previous work. Experimental results clear show that the NGKA is an effective clustering algorithm and outperforms two other genetic algorithm based clustering methods implemented for comparison.
Similar content being viewed by others
References
Areibi S, Yang Z (2004) Effective memetic algorithms for VLSI design automation = genetic algorithms + local search + multi-level clustering. Evolut Comput 12(3):327–353
Babu GP, Murty MN (1994) Clustering with evolution strategies. Pattern Recogn 27(2):321–329
Back T (1996) Evolutionary algorithms in theory and practice. Oxford University Press, Oxford
Beasley D, Bull DR, Martin RR (1993) A sequential niche technique for multimodal function optimization. Evolut Comput 1(2):101–125
Bezdek JC (1981) Pattern recognition with fuzzy objective functions. Plenum, New York
Branke J, Middendorf M, Schneider F (1998) Improved heuristics and a genetic algorithm for finding short supersequences. Oper Res 20(1):39–45
Brown DE, Huntley CL (1992) A practical application of simulated annealing to clustering. Pattern Recogn 25(4):401–412
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrieflian AE, Landsman D, Lockhart DJ, Davis RW (1998) A genome-wide transcriptional analysis of the mitotic cell cycle. Molec Cell 2(1):65–73
Cucchiara R (1998) Genetic algorithms for clustering in machine vision. Mach Vis Appl 11(1):1–6
Damavandi N, Safavi-Naeini S (2003) A global optimization algorithm based on combined evolutionary programming for cluster analysis. In: Proceedings of IEEE conference on electrical and computer engineering, vol 2, pp 4–7
DeJong KA (1975) An analysis of the behavior of a class of genetic adaptive systems. PhD dissertation, University of Michigan, Ann Arbor
Dembele D, Kastner P (2003) Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8):973–980
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976
Garey M, Johnson D (1979) Computers and intractability—a guide to the theory of NP-completeness. W.H. Freeman, San Francisco
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading
Goldberg DE, Richardson J (1987) Genetic algorithms with sharing for multimodal function optimization. In: Proceedings of the 2nd international conference on genetic algorithms, Hillsdale, New Jersey, USA, pp 41–49
Hall LO, Ozyurt B, Bezdek JC (1999) Clustering with a genetically optimized approach. IEEE Trans Evol Comput 3(2):103–112
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. Appl Stat 28:100–110
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JC, Trent JM, Staudt LM, Hudson J Jr, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO (1999) The transcriptional program in the response of human fibroblasts to serum. Science 283:83–87
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs
Jin HD, Leung KS, Leung WM (2001) Genetic-guided model-based clustering algorithms. Proc Int Conf Artif Intell 2:653–659
Klein RW, Dubes RC (1989) Experiments in projection and clustering by simulated annealing. Pattern Recogn 22(2):213–220
Kodek DM (1980) Design of optimal finite word length FIR digital filters using integer programming techniques. IEEE Trans ASSP 28:304–308
Koontz WL, Narendra PM, Fukunaga K (1975) A branch and bound clustering algorithm. IEEE Trans Comp 24:908–915
Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern B Cybern 29(3):433–439
Lozano JA, Larrañaga P (1999) Applying genetic algorithms to search for the best hierarchical clustering of a dataset. Pattern Recogn Lett 20(9):911–918
Li FF, Morgan R, Williams D (1997) Hybrid genetic approaches to ramping rate constrained dynamic economic dispatch. Electric Power Syst Res 43(2):97–103
Mahfoud SW (1995) Niching methods for genetic algorithms. PhD dissertation, Univ. of Illinois, Urbana-Champaign
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recogn 33:1455–1465
Michalewicz Z (1996) Genetic algorithms + Data structure = Evolution programs, 3rd edn. Springer, Berlin
Murthy CA, Chowdhury N (1996) In search of optimal clusters using genetic algorithms. Pattern Recogn Lett 17:825–832
Pelikan M, Goldberg DE (2000) Genetic algorithm clustering, and the breaking of symmetry. In: Proceedings of parallel problem solving from nature, pp 385–394
Petrowski A (1996) A clearing procedure as a niching method for genetic algorithms. In: Proceedings of IEEE international conference on evolutionary computation, pp 798–803
Sareni B, Krähenbühl L (1998) Fitness sharing and niching methods revisited. IEEE Trans Evol Comput 2:97–106
Sareni B, Krahenbuhl L, Nicolas A (2000) Efficient genetic algorithms for solving hard constrained optimization problems. IEEE Trans Magn 36(4):1027–1030
Sarkar M, Yegnanarayana B, Khemani D (1997) A clustering algorithm using an evolutionary programming-based approach. Pattern Recogn Lett 18:975–986
Sharan R, Shamir R (2000) CLICK: a clustering algorithm with application to gene expression analysis. In: Proceedings of AAAI-ISMB, pp 307–316
Tamburino LA, Zmuda MA, Rizki MM (1995) Generating pattern recognition systems using evolutionary learning expert. IEEE Intell Syst 10(4):63–68
Tavazoie S, Hughes D, Campbell JMJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nat Genetic 22:281–285
Villarreal B, Karwan MH (1982) Multicriteria dynamic programming with an application to the integer case. J Math Anal Appl 38:43–69
Whitley D (1995) Modeling hybrid genetic algorithms. In: Winter G, Periaux J, Galan M, Cuesta P (eds) Genetic algorithms in engineering and computer science. Wiley, New York, pp 191–201
Wu S, Liew AWC, Yan H, Yang M (2004) Cluster analysis of gene expression database on self-splitting and merging competitive learning. IEEE Trans Inf Technol Biomed 8(1):5–15
Yeung KY (2000) Clustering analysis of gene expression data. PhD Thesis, University of Washington
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sheng, W., Tucker, A. & Liu, X. A niching genetic k-means algorithm and its applications to gene expression data. Soft Comput 14, 9–19 (2010). https://doi.org/10.1007/s00500-008-0386-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-008-0386-9