Abstract
We propose a hybrid genetic algorithm for k-medoids clustering. A novel heuristic operator is designed and integrated with the genetic algorithm to fine-tune the search. Further, variable length individuals that encode different number of medoids (clusters) are used for evolution with a modified Davies-Bouldin index as a measure of the fitness of the corresponding partitionings. As a result the proposed algorithm can efficiently evolve appropriate partitionings while making no a priori assumption about the number of clusters present in the datasets. In the experiments, we show the effectiveness of the proposed algorithm and compare it with other related clustering methods.
Similar content being viewed by others
References
Agarwal, P., M. Sharir, and E. Welzl. (1997). “The Discrete 2-Center Problem.” In Proceedings of the 13th ACM Symposium on Computational Geometry, pp. 147–155.
Agrawal, R. and R. Srikant. (1994). “Fast Algorithms for Mining Association Rules.” In Proceedings of the 20th VLDB Conference, pp. 487–499.
Areibi, S. and Z. Yang. (2004). “Effective Memetic Algorithms for VLSI Design Automation = Genetic Algorithms $+$ Local Search $+$ Multi-Level Clustering.” Evolutionary Computation 12(3), 327–353.
Bandyopadhyay, S. and U. Maulik. (2002). “An Evolutionary Technique Based on k-Means Algorithm for Optimal Clustering in RN.” Information Science 146(1–4), 221–237.
Cho, R.J., M. Campbell, E. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. Wolfsberg, A. Gabrielian, D. Landsman, D. Lockhart, and R. Davis. (1998). “A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle.” Molecular Cell 2(1), 65–73.
Cucchiara, R. (1998). “Genetic Algorithms for Clustering in Machine Vision.” Machine Vision and Applications 11(1), 1–6.
Davies, D.L. and D. W. Bouldin. (1979). “A Cluster Separation Measure.” IEEE Trans. Pattern Analysis and Machine Intelligence 1, 224–227.
Dembele, D. and P. Kastner. (2003). “Fuzzy c-Means Method for Clustering Microarray Data.” Bioinformatics 19(8), 973–980.
Duda, R.O., P.E. Hart, and D.G. Stork. (2001). Pattern Classification New York, Wiley.
Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, Mass, Addison-Wesley.
Estivill-Castro, V. and A.T. Murray. (1997). “Spatial Clustering for Data Mining with Generic Algorithms.” Technical Report FIT-TR-97-10, Queensland University of Technology, Australia.
Falkenauer, E. (1998). Genetic Algorithms and Grouping Problems. Boston: John Wiley & Sons.
Garey, M. and D. Johnson. (1979). Computers and Intractability—A Guide to the Theory of NP-Completeness. San Francisco, W.H. Freeman.
Goldberg, D.E. and J. Richardson. (1987). “Genetic Algorithms with Sharing for Multimodal Function Optimization.” In Proceedings of the 2nd International Conference Genetic Algorithms, pp. 41–49.
Hall, L.O., I.B. Ozyurt, and J. C. Bezdek. (1999). “Clustering with a Genetically Optimized Approach.” IEEE Transactions Evolutionary Computation 3(2), 103–112.
Hartigan, J.A. (1975). Clustering Algorithms, Wiley.
Hartigan, J.A., and M.A. Wong. (1979). “A k-Means Clustering Algorithm.” Applied Statistics 28, 100–110.
Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, University of Michigan Press.
Hopke, P.K. and L. Kaufman. (1990). “The Use of Sampling to Cluster Large Data Sets.” Chemom. Intelligence Laboratoire System 8, 195–204.
Hruschka, E.R., L.N. de Castro, and R.J. G.B. Campello. (2004). “Evolutionary Algorithms for Clustering Gene-Expression Data.” In Proceedings of the IEEE International Conference on Data Mining, pp. 403–406.
Hruschka, E.R. and F.F.E. Nelson. (2003). “A Genetic Algorithm for Cluster Analysis.” Intelligent Data Analysis 7, 15–25.
Jain, A.K. and R.C. Dubes. (1988). Algorithms for Clustering Data. Englewood Cliffs, N.J., Prentice Hall.
Kaufman, L. and P.J. Rousseeuw. (1990). Finding Groups in Data: an Introduction to Cluster Analysis. N.Y., John Wiley & Sons.
Krishna, K. and M.N. Murty. (1999). “Genetic k-Means Algorithm,” IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, Vol. 29, No. 3.
Lance, G.N. and W.T. Williams. (1967). “A General Theory of Classificatory Sorting Strategies: II Clustering Systems.” Computers Journal 10, 271–277.
Lucasius, CB., A.D. Dane, and G. Kateman. (1993). “On k-Medoid Clustering of Large Data Sets with the Aid of a Genetic Algorithm: Background, Feasibility and Comparison.” Analytical Chimica Acta 282, 647–669.
MacQueen, J. (1967). “Some Methods for Classification and Analysis of Multivariate Observations.” In Proceedings of the 5th Berkeley Symp. Mathematical Statistics and Probability, pp. 281–297.
Maulik, U. and S. Bandyopadhyay. (2000). “Genetic Algorithm-based Clustering Technique.” Pattern Recognition 33(9), 1455–1465.
Murthy, C.A. and N. Chowdhury. (1996). “In search of Optimal Clusters using Genetic Algorithms.” Pattern Recognition Letters 17, 825–832.
Ng, R. and J. Han. (2002). “CLARANS: A Method for Clustering Objects for Spatial Data Mining.” IEEE Transactions Knowldge of Data Engineering 14(5), 1003–1016.
Pal, N.R. and J.C. Bezdek. (1995). “On Cluster Validity for the Fuzzy c-Means Model.” IEEE Transactions on Fuzzy Systems 3(3), 370–379.
Plackett, R.L. and J.P. Burman. (1946). “The Design of Optimum Multifactorial Experiments.” Biometrika 33, 305–325.
Scheunders, P. (1997). “A Genetic c-Means Clustering Algorithm Applied to Color Image Quantization.” Pattern Recognition 30(6), 859–866.
Sheng W. and X. Liu. (2004). “A Hybrid Algorithm for k-Medoids Clustering of Large Data Sets.” In Proceedings of the IEEE Congress on Evolutionary Computation, pp. 77–82.
Smith, G.D., J.C.W. Debuse, M.D. Ryan, and L.M. Whittley. (2000). “An Effective Genetic Algorithm for the Fixed Channel Assignment Problem.” Telecommunications Optimisation: Heuristic and Adaptive Techniques, John Wiley and Sons, pp. 357–371.
Tavazoie, S., D. Hughes, J.M.J. Campbell, R.J. Cho, and G.M. Church. (1999). “Systematic Determination of Genetic Metwork Architecture.” Nature Genetics 22, 281–285.
Wu, S., A.W.C Liew, H. Yan, and M. Yang. (2004). “Cluster Analysis of Gene Expression Database on Self-Splitting and Merging Competitive Learning.” IEEE Transactions on Information Technology in Biomedicine 8(1).
Yeung, K.Y. (2001). “Clustering Analysis of Gene Expression data.” PhD Thesis, University of Washington.
Yi, L., S. Lu, F. Fotouhi, Y. Deng, and S. Brown. (2004). “Incremental Genetic k-Means Algorithm and Its Application in Gene Expression Data Analysis.” BMC Bioinformatics 5, 172.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sheng, W., Liu, X. A genetic k-medoids clustering algorithm. J Heuristics 12, 447–466 (2006). https://doi.org/10.1007/s10732-006-7284-z
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10732-006-7284-z