Abstract
This paper presents a new stochastic methodology, which is based on the concepts of genetic algorithms (GAs) and greedy randomized adaptive search procedure (GRASP), for optimally clustering N objects into K clusters. The proposed stochastic algorithm (Hybrid GEN–GRASP) for the solution of the clustering problem is a two phase algorithm which combines a genetic algorithm for the solution of the feature selection problem and a GRASP algorithm for the solution of the clustering problem. Due to the nature of stochastic and population-based search, the proposed algorithm can overcome the drawbacks of traditional clustering methods. Its performance is compared with another methodology that uses for the solution of the feature selection problem a very popular metaheuristic method, the Tabu Search algorithm. Results from the application of the methodology to data sets from the UCI Machine Learning Repository are presented.
Similar content being viewed by others
References
Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher D, Lenx J-H (eds) Artificial intelligence and statistics. Springer, New York
Al-Sultan K (1995) A tabu search approach to the clustering problem. Pattern Recognit 28(9):1443–1451
Azzag H, Guinot C, Venturini G (2006) Data and text mining with hierarchical clustering ants. In: Abraham A, Grosan C, Ramos V (eds) Swarm intelligence in data mining. Springer, Berlin, pp 153–190
Azzag H, Venturini G, Oliver A, Gu C (2007) A hierarchical ant based clustering algorithm and its use in three real-world applications. Eur J Oper Res 179:906–922
Babu G, Murty M (1993) A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recognit Lett 14(10):763–769
Brown D, Huntley C (1992) A practical application of simulated annealing to clustering. Pattern Recognit 25(4):401–412
Cano JR, Cordón O, Herrera F, Sánchez L (2002) A GRASP algorithm for clustering. In: Garijo FJ, Riquelme JC, Toro M (eds) IBERAMIA 2002, LNAI 2527. Springer, Berlin, pp 214–223
Cantu-Paz E, Newsam S, Kamath C (2004) Feature selection in scientific application. In Proceedings of the 2004 ACM SIGKDD international conference on knowledge discovery and data mining, pp 788–793
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14:315–332
Chen L, Tu L, Chen H (2005) A novel ant clustering algorithm with digraph. In: Wang L, Chen K, Ong YS (eds) ICNC 2005, LNCS 3611. Springer, Berlin, pp 1218–1228
Chu S, Roddick J (2000) A clustering algorithm using the tabu search approach with simulated annealing. In: Ebecken N, Brebbia C (eds) Data mining II—Proceedings of second international conference on data mining methods and databases. Cambridge, pp 515–523
Cowgill M, Harvey R, Watson L (1999) A genetic algorithm approach to cluster analysis. Comput Math Appl 37:99–108
Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedure. J Glob Optim 6:109–133
Glover F (1989) Tabu search I. ORSA J Comput 1(3):190–206
Glover F (1990) Tabu search II. ORSA J Comput 2(1):4–32
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Massachussets
He Y, Hui SC, Sim Y (2006) A novel ant-based clustering approach for document clustering. In: Ng HT, et al (eds) AIRS 2006, LNCS 4182. Springer, Berlin, pp 537–544
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor, MI
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19:153–158
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Janson S, Merkle D (2005) A new multi-objective particle swarm optimization algorithm using clustering applied to automated docking. In: Blesa MJ, et al (eds) HM 2005, LNCS 3636. Springer, Berlin, pp 128–141
Kao Y, Cheng K (2006) An ACO-based clustering algorithm. In: Dorigo M, et al (eds) ANTS 2006, LNCS 4150. Springer, Berlin, pp 340–347
Kao Y-T, Zahara E, Kao I-W (2007) A hybridized approach to data clustering. Expert sys appl doi: 10.1016/j.eswa.2007.01.028
Kira K, Rendell L (1992) A practical approach to feature selection. In Proceedings of the ninth international conference on machine learning, Aberdeen, Scotland, pp 249–256
Li Z, Tan H-Z (2006) A combinational clustering method based on artificial immune system and support vector machine. In: Gabrys B, Howlett RJ, Jain LC (eds) KES 2006, Part I, LNAI 4251. Springer, Berlin, pp 153–162
Liao S-H, Wen C-H (2007) Artificial neural networks classification and clustering of methodologies and applications—literature analysis from 1995 to 2005. Expert sys appl 32:1–11
Liu Y, Chen K, Liao X, Zhang W (2004) A genetic clustering method for intrusion detection. Pattern Recognit 37:927–942
Liu Y, Liu Y, Wang L, Chen K (2005) A hybrid tabu search based clustering algorithm. In: Khosla R, et al (eds) KES 2005, LNAI 3682. Springer, Berlin, pp 186–192
Marinakis Y, Migdalas A, Pardalos PM (2005a) Expanding neighborhood GRASP for the traveling salesman problem. Comput Optim Appl 32:231–257
Marinakis Y, Migdalas A, Pardalos PM (2005b) A hybrid genetic-GRASP algorithm using langrangean relaxation for the traveling salesman problem. J Comb Optim 10:311–326
Marinakis Y, Marinaki M, Doumpos M, Matsatsinis N, Zopounidis C, (2007) Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment. J Glob Optim (accepted)
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33:1455–1465
Meng L, Wu QH, Yong ZZ (2000) A faster genetic clustering algorithm. In: Cagnoni S, et al (eds) EvoWorkshops 2000, LNCS 1803. Springer, Berlin, pp 22–33
Mirkin B, (1996) Mathematical classification and clustering. Kluwer Academic Publishers, Dordrecht, The Netherlands
Nasraoui O, Gonzalez F, Cardona C, Rojas C, Dasgupta D (2003) A scalable artificial immune system model for dynamic unsupervised learning. In: Cantú-Paz E, et al (eds) GECCO 2003, LNCS 2723. Springer-Verlag, Berlin Heidelberg, pp 219–230
Ng MK (2000) A note on constrained K-means algorithms. Pattern Recognit 33:515–519
Paterlini S, Krink T (2006) Differential evolution and particle swarm optimisation in partitional clustering. Comput Stat Data Anal 50:1220–1247
Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in colour image segmentation. In Proceedings of the 4th international conference on advances in pattern recognition and digital techniques (ICAPRDT99), Calcutta, India
Reeves CR (1995) Genetic algorithms. In: Reeves CR (ed) Modern heuristic techniques for combinatorial problems. McGraw–Hill, London, pp 151–196
Reeves CR (2003) Genetic algorithms. In: Glover F, Kochenberger GA (eds) Handbooks of metaheuristics. Kluwer Academic Publishers, Dordrecht, pp 55–82
Resende MGC, Ribeiro CC (2003) Greedy randomized adaptive search procedures. In: Glover F, Kochenberger GA (eds) Handbooks of metaheuristics. Kluwer Academic Publishers, Dordrecht, pp 219–249
Rokach L, Maimon O (2005) Clustering methods. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, New York, pp 321–352
Selim S, Alsultan K (1991) A simulated annealing algorithm for the clustering problems. Pattern Recognit 24(10):1003–1008
Shelokar PS, Jayaraman VK, Kulkarni BD (2004) An ant colony approach for clustering. Anal Chim Acta 509:187–195
Shen H-Y, Peng X-Q, Wang J-N, Hu Z-K (2005) A mountain clustering based on improved PSO algorithm. In: Wang L, Chen K, Ong YS (eds) ICNC 2005, LNCS 3612. Springer, Berlin, pp 477–481
Shen J, Chang SI, Lee ES, Deng Y, Brown SJ (2005) Determination of cluster number in clustering microarray data. Appl Math Comput 169:1172–1185
Sheng W, Liu X (2006) A genetic k-medoids clustering algorithm. J Heuristics 12:447–466
Sherafat V, Nunes de Castro L, Hruschka ER (2004) TermitAnt: an ant clustering algorithm improved by ideas from termite colonies. In: Pal NR, et al (eds) ICONIP 2004, LNCS 3316. Springer, Berlin, pp 1088–1093
Sun J, Xu W, Ye B (2006) Quantum-behaved particle swarm optimization clustering algorithm. In: Li X, Zaiane OR, Li Z (eds) ADMA 2006, LNAI 4093. Springer, Berlin, pp 340–347
Sung C, Jin H (2000) A Tabu-search-based heuristic for clustering. Pattern Recognit 33:849–858
Tarsitano A, (2003) A computational study of several relocation methods for k-means algorithms. Pattern Recognit 36:2955–2966
Tsang C-H, Kwong S (2006) Ant colony clustering and feature extraction for anomaly intrusion detection. Stud Comput Intell (SCI) 34:101–123
Tseng L, Yang S (2000) A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recognit 33:1251–1259
Tseng L, Yang S (2001) A genetic approach to the automatic clustering problem. Pattern Recognit 34:415–424
Wu F-X, Zhang WJ, Kusalik AJ (2003) A genetic k-means clustering algorithm applied to gene expression data. In: Xiang Y, Chaib-draa B (eds) AI 2003, LNAI 2671. Springer, Berlin, pp 520–526
Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Yang Y, Kamel MS (2006) An aggregated clustering approach using multi-ant colonies algorithms. Pattern Recognit 39:1278–1289
Yeh J-Y, Fu JC (2007) A hierarchical genetic algorithm for segmentation of multi-spectral human–brain MRI. Expert sys appl doi: 10.1016/j.eswa.2006.12.012
Younsi R, Wang W (2004) A new artificial immune system algorithm for clustering. In: Yang ZR, et al (eds) IDEAL 2004, LNCS 3177. Springer, Berlin, pp 58–64
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Marinakis, Y., Marinaki, M., Doumpos, M. et al. A hybrid stochastic genetic–GRASP algorithm for clustering analysis. Oper Res Int J 8, 33–46 (2008). https://doi.org/10.1007/s12351-008-0004-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12351-008-0004-8