Skip to main content
Log in

PSOHS: an efficient two-stage approach for data clustering

  • Regular Research Paper
  • Published:
Memetic Computing Aims and scope Submit manuscript

Abstract

Cluster analysis is an important task in data mining and refers to group a set of objects such that the similarities among objects within the same group are maximal while similarities among objects from different groups are minimal. The particle swarm optimization algorithm (PSO) is one of the famous metaheuristic optimization algorithms, which has been successfully applied to solve the clustering problem. However, it has two major shortcomings. The PSO algorithm converges rapidly during the initial stages of the search process, but near global optimum, the convergence speed will become very slow. Moreover, it may get trapped in local optimum if the global best and local best values are equal to the particle’s position over a certain number of iterations. In this paper we hybridized the PSO with a heuristic search algorithm to overcome the shortcomings of the PSO algorithm. In the proposed algorithm, called PSOHS, the particle swarm optimization is used to produce an initial solution to the clustering problem and then a heuristic search algorithm is applied to improve the quality of this solution by searching around it. The superiority of the proposed PSOHS clustering method, as compared to other popular methods for clustering problem is established for seven benchmark and real datasets including Iris, Wine, Crude Oil, Cancer, CMC, Glass and Vowel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abul Hasan MJ, Ramakrishnan S (2011) A survey: hybrid evolutionary algorithms for cluster analysis. Artif Intell Rev 36:179–204

    Article  Google Scholar 

  2. Acampora G, Gaeta M, Loia V (2010) Exploring e-learning knowledge through ontological memetic agents. Comput Intell 5:66–77

    Article  Google Scholar 

  3. Acampora G, Gaeta M, Loia V (2011) Combining multi-agent paradigm and memetic computing for personalized and adaptive learning experiences. Comput Intell 27:141–165

    Article  MathSciNet  Google Scholar 

  4. Anaya-Sánchez H, Pons-Porrata A, Berlanga-Llavori R (2010) A document clustering algorithm for discovering and describing topics. Pattern Recognit Lett 31:502–510

    Article  Google Scholar 

  5. Blake CL, Merz CJ UCI repository of machine learning databases. http://www.ics.uci.edu/-mlearn/MLRepository.html

  6. Ching-Yi C, Fun Y (2004) Particle swarm optimization algorithm and its application to clustering analysis. In: 2004 IEEE International Conference on Networking, Sensing and Control, vol 782, pp 789–794

  7. Derrac J, Garcia S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput 1:3–18

    Article  Google Scholar 

  8. Fan J, Han M, Wang J (2009) Single point iterative weighted fuzzy C-means clustering algorithm for remote sensing image segmentation. Pattern Recognit 42:2527–2540

    Article  MATH  Google Scholar 

  9. Fathian M, Amiri B, Maroosi A (2007) Application of honey-bee mating optimization algorithm on clustering. Appl Math Comput 190:1502–1513

    Article  MathSciNet  MATH  Google Scholar 

  10. Feng D, Wenkang S, Liangzhou C, Yong D, Zhenfu Z (2005) Infrared image segmentation with 2-D maximum entropy method based on particle swarm optimization (PSO). Pattern Recognit Lett 26:597–603

    Article  Google Scholar 

  11. Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:2

    Google Scholar 

  12. Friedman M, Last M, Makover Y, Kandel A (2007) Anomaly detection in web documents using crisp and fuzzy-based cosine clustering methodology. Inf Sci 177:467–475

    Article  Google Scholar 

  13. Gil-García R, Pons-Porrata A (2010) Dynamic hierarchical algorithms for document clustering. Pattern Recognit Lett 31:469–477

    Article  Google Scholar 

  14. Güngr Z, Ünler A (2007) K-harmonic means data clustering with simulated annealing heuristic. Appl Math Comput 184:199–209

    Article  MathSciNet  Google Scholar 

  15. Guo YW, Li WD, Mileham AR, Owen GW (2009) Applications of particle swarm optimisation in integrated process planning and scheduling. Robot Comput-Integr Manuf 25:280–288

    Article  Google Scholar 

  16. Hatamlou A (2012) In search of optimal centroids on data clustering using a binary search algorithm. Pattern Recognit Lett 33:1756–1760

    Article  Google Scholar 

  17. Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184

    Article  MathSciNet  Google Scholar 

  18. Hatamlou A, Abdullah S, Hatamlou M (2011a) Data clustering using big bang-big crunch algorithm. CCIS 241:383–388

    Google Scholar 

  19. Hatamlou A, Abdullah S, Nezamabadi-pour H (2011b) Application of Gravitational Search Algorithm on Data Clustering, Rough Sets and Knowledge Technology. Springer, Berlin/Heidelberg

    Google Scholar 

  20. Hatamlou A, Abdullah S, Nezamabadi-pour H (2012) A combined approach for clustering based on K-means and gravitational search algorithms. Swarm Evolut Comput 6:47–52

    Article  Google Scholar 

  21. Hruschka ER, Campello RJGB, Freitas AA, de Carvalho ACPLF (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern Part C: Appl Rev 39:133–155

    Article  Google Scholar 

  22. Han J, Kamber M (2001) Data mining: concepts and techniques. Academic Press, New York

    Google Scholar 

  23. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666

    Article  Google Scholar 

  24. Jin Y-X, Cheng H-Z, Zhang L (2007) New discrete method for particle swarm optimization and its application in transmission network expansion planning. Electr Power Syst Res 77:227–233

    Article  Google Scholar 

  25. Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11:652–657

    Article  Google Scholar 

  26. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, 1995, vol 1944, pp 1942–1948

  27. Kerr G, Ruskin HJ, Crane M, Doolan P (2008) Techniques for clustering gene expression data. Comput Biol Med 38:283–293

    Article  Google Scholar 

  28. Kim K-j, Ahn H (2008) A recommender system using GA K-means clustering in an online shopping market. Expert Syst Appl 34:1200–1209

    Article  Google Scholar 

  29. Krishna K, Murty MN (1999) Genetic K-means algorithm. IEEE Trans Syst Man Cybern Part B: Cybern 29:433–439

    Article  Google Scholar 

  30. Kuo RJ, Chao CM, Chiu YT (2009) Application of particle swarm optimization to association rule mining. Appl Soft Comput 11:326–336

    Article  Google Scholar 

  31. Liang F, Wang N (2007) Dynamic agglomerative clustering of gene expression profiles. Pattern Recognit Lett 28:1062–1076

    Article  MathSciNet  Google Scholar 

  32. Liao L, Lin T, Li B (2008) MRI brain image segmentation and bias field correction based on fast spatially constrained kernel clustering approach. Pattern Recognit Lett 29:1580–1588

    Article  Google Scholar 

  33. Liu Y, Yi Z, Wu H, Ye M, Chen K (2008) A tabu search approach for the minimum sum-of-squares clustering problem. Inf Sci 178:2680–2704

    Article  MathSciNet  MATH  Google Scholar 

  34. Mahdavi M, Chehreghani MH, Abolhassani H, Forsati R (2008) Novel meta-heuristic algorithms for clustering web documents. Appl Math Comput 201:441–451

    Article  MathSciNet  MATH  Google Scholar 

  35. Maitra M, Chatterjee A (2008) A hybrid cooperative-comprehensive learning based PSO algorithm for image segmentation using multilevel thresholding. Expert Syst Appl 34:1341–1350

    Google Scholar 

  36. Marinakis Y, Marinaki M, Doumpos M, Zopounidis C (2009) Ant colony and particle swarm optimization for financial classification problems. Expert Syst Appl 36:10604–10611

    Article  Google Scholar 

  37. Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33:1455–1465

    Article  Google Scholar 

  38. Montalvo I, Izquierdo J, Pérez R, Tung MM (2008) Particle swarm optimization applied to the design of water supply systems. Comput Math Appl 56:769–776

    Article  MathSciNet  MATH  Google Scholar 

  39. Moshtaghi M, Havens TC, Bezdek JC, Park L, Leckie C, Rajasegarar S, Keller JM, Palaniswami M (2011) Clustering ellipses for anomaly detection. Pattern Recognit 44:55–69

    Article  MATH  Google Scholar 

  40. Niknam T, Fard ET, Ehrampoosh S, Rousta A (2011) A new hybrid imperialist competitive algorithm on data clustering. Sadhana—Acad Proc Eng Sci 36:293–315

    Google Scholar 

  41. Papa JP, Fonseca LMG, de Carvalho LAS (2010) Projections onto convex sets through particle swarm optimization and its application for remote sensing image restoration. Pattern Recognit Lett 31:1876–1886

    Article  Google Scholar 

  42. Perez CA, Aravena CM, Vallejos JI, Estevez PA, Held CM (2010) Face and iris localization using templates designed by particle swarm optimization. Pattern Recognit Lett 31:857–868

    Article  Google Scholar 

  43. Saglam B, Salman FS, SayIn S, Türkay M (2006) A mixed-integer programming approach to the clustering problem with an application in customer segmentation. Eur J Oper Res 173:866–879

    Article  MATH  Google Scholar 

  44. Scheunders P (1997) A genetic c-means clustering algorithm applied to color image quantization. Pattern Recognit 30:859–866

    Article  Google Scholar 

  45. Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. In: Pattern analysis and machine intelligence, IEEE Transactions on PAMI-6, pp 81–87

  46. Sha DY, Hsu C-Y (2008) A new particle swarm optimization for the open shop scheduling problem. Comput Oper Res 35:3243–3261

    Article  MATH  Google Scholar 

  47. Shelokar PS, Jayaraman VK, Kulkarni BD (2004) An ant colony approach for clustering. Analyt Chim Acta 509:187–195

    Article  Google Scholar 

  48. Wang C-H (2009) Outlier identification and market segmentation using kernel-based clustering techniques. Expert Syst Appl 36:3744–3750

    Article  Google Scholar 

  49. Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28:459–471

    Article  Google Scholar 

  50. Xia Y, Feng D, Wang T, Zhao R, Zhang Y (2007) Image segmentation by clustering of spatial patterns. Pattern Recognit Lett 28:1548–1555

    Article  Google Scholar 

  51. Yang S, Wu R, Wang M, Jiao L (2010) Evolutionary clustering based vector quantization and SPIHT coding for image compression. Pattern Recognit Lett 31:1773–1780

    Article  Google Scholar 

  52. Yazdani D, Golyari S, Meybodi MR (2010) A new hybrid approach for data clustering. In: 5th International Symposium on Telecommunications, IST 2010, pp 914–919

  53. Zhong W, He J, Harrison R, Tai PC, Pan Y (2007) Clustering support vector machines for protein local structure prediction. Expert Syst Appl 32:518–526

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdolreza Hatamlou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hatamlou, A., Hatamlou, M. PSOHS: an efficient two-stage approach for data clustering. Memetic Comp. 5, 155–161 (2013). https://doi.org/10.1007/s12293-013-0110-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12293-013-0110-x

Keywords

Navigation