Skip to main content

Improving Group Search Optimization for Automatic Data Clustering Using Merge and Split Operators

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2022)

Abstract

The amount of digital data produced daily has increased considerably in the last years. The need for fast and reliable information in real-world applications demands ever more precise algorithms and Data Mining tools, once most of the systems in our daily lives are executed in real-time. Data clustering is one of the most important and primitive activities in Unsupervised Machine Learning, consisting in a fundamental mechanism for exploratory data analysis. Given the complexity of data clustering task, standard clustering methods, such as the partitional algorithms, are easily trapped in local optima solutions, due to their lack of good global searching operators. In this work, three improved Group Search Optimization-based approaches are proposed, based on merge and split heuristics, in the context of Automatic Clustering Analysis: MGSO, SGSO and MSGSO. Group Search Optimization (GSO) is a natural-inspired meta-heuristic, known for its good global search abilities, and mechanisms to escape from local optima points from the problem space. The proposed models attempt to perform both cluster optimization and the determination of the best number of clusters for each dataset, overcoming the limitations of traditional partitional clustering algorithms. The proposed GSO-based models are evaluated through a testing bed composed of nine real-world problems, and compared to six state-of-the-art partitional automatic clustering approaches, include standard GSO. The experimental evaluation has been performed considering five clustering metrics, and both empirical and statistical analysis. The results showed that the proposed MGSO, SGSO and MSGSO algorithms are very promising and reliable while tackling clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2010 Second International Conference on Machine Learning and Computing, pp. 71–75. IEEE (2010)

    Google Scholar 

  2. Ahmadyfard, A., Modares, H.: Combining PSO and k-means to enhance data clustering. In: International Symposium on Telecommunications, IST 2008, pp. 688–691. IEEE (2008)

    Google Scholar 

  3. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  4. Barnard, C., Sibly, R.: Producers and scroungers: a general model and its application to captive flocks of house sparrows. Anim. Behav. 29(2), 543–550 (1981)

    Article  Google Scholar 

  5. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm intelligence: from natural to artificial systems, vol. 4. Oxford University Press, New York (1999)

    Google Scholar 

  6. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3(1), 1–27 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  7. Civicioglu, P.: Backtracking search optimization algorithm for numerical optimization problems. Appl. Math. Comput. 219(15), 8121–8144 (2013)

    MathSciNet  MATH  Google Scholar 

  8. Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature 433(7025), 513–516 (2005)

    Article  Google Scholar 

  9. Das, S., Abraham, A., Konar, A.: Automatic clustering using an improved differential evolution algorithm. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans 38(1), 218–237 (2007)

    Article  Google Scholar 

  10. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (1979)

    Google Scholar 

  11. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Dey, A., Dey, S., Bhattacharyya, S., Platos, J., Snasel, V.: Novel quantum inspired approaches for automatic clustering of gray level images using particle swarm optimization, spider monkey optimization and ageist spider monkey optimization algorithms. Appl. Soft Comput. 88, 106040 (2020)

    Article  Google Scholar 

  13. Dixon, A.: An experimental study of the searching behaviour of the predatory coccinellid beetle adalia decempunctata (l.). J. Animal Ecol. 28, 259–281 (1959)

    Google Scholar 

  14. Dorigo, M., Maniezzo, V., Colorni, A.: Ant system: optimization by a colony of cooperating agents. IEEE Transactions Syst. Man Cybern. Part B: Cybern 26(1), 29–41 (1996)

    Google Scholar 

  15. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ezugwu, A.E., Shukla, A.K., Agbaje, M.B., Oyelade, O.N., José-García, A., Agushaka, J.O.: Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Comput. Appl. 33(11), 6247–6306 (2021)

    Article  Google Scholar 

  17. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)

    Article  MATH  Google Scholar 

  18. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: part I. ACM SIGMOD Rec. 31(2), 40–45 (2002)

    Article  Google Scholar 

  19. He, S., Wu, Q.H., Saunders, J.R.: Group search optimizer: an optimization algorithm inspired by animal searching behavior. IEEE Trans. Evol. Comput. 13(5), 973–990 (2009)

    Article  Google Scholar 

  20. Higgins, C.L., Strauss, R.E.: Discrimination and classification of foraging paths produced by search-tactic models. Behav. Ecol. 15(2), 248–254 (2004)

    Article  Google Scholar 

  21. Holland, J.H.: Genetic algorithms. Scientific Am. 267(1), 66–72 (1992)

    Google Scholar 

  22. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Google Scholar 

  23. Ikotun, A.M., Almutari, M.S., Ezugwu, A.E.: K-means-based nature-inspired metaheuristic algorithms for automatic data clustering problems: recent advances and future directions. Appl. Sci. 11(23), 11246 (2021)

    Article  Google Scholar 

  24. Jin, Y.F., Yin, Z.Y.: Enhancement of backtracking search algorithm for identifying soil parameters. Int. J. Numer. Anal. Meth. Geomech. 44(9), 1239–1261 (2020)

    Article  Google Scholar 

  25. José-García, A., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)

    Article  Google Scholar 

  26. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)

    Google Scholar 

  27. Latiff, N.A., Malik, N.N.A., Idoumghar, L.: Hybrid backtracking search optimization algorithm and k-means for clustering in wireless sensor networks. In: 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure Computing, 14th Intl Conference on Pervasive Intelligence and Computing, 2nd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 558–564. IEEE (2016)

    Google Scholar 

  28. Liu, Y., Wu, X., Shen, Y.: Automatic clustering using genetic algorithms. Appl. Math. Comput. 218(4), 1267–1279 (2011)

    MathSciNet  MATH  Google Scholar 

  29. Naldi, M.C., Campello, R.J., Hruschka, E.R., Carvalho, A.: Efficiency issues of evolutionary k-means. Appl. Soft Comput. 11(2), 1938–1952 (2011)

    Article  Google Scholar 

  30. Nemenyi, P.B.: Distribution-free multiple comparisons. Princeton University (1962)

    Google Scholar 

  31. Omran, M., Salman, A., Engelbrecht, A.: Dynamic clustering using particle swarm optimization with application in unsupervised image classification. In: Fifth World Enformatika Conference (ICCI 2005), Prague, Czech Republic, pp. 199–204 (2005)

    Google Scholar 

  32. Pacífico, L.: Agrupamento de imagens baseado em uma abordagem híbrida entre a otimização por busca em grupo e k-means para a segmentação automática de doenças em plantas. In: Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional, pp. 152–163. SBC (2020)

    Google Scholar 

  33. Pacifico, L., Ludermir, T.: Backtracking group search optimization: a hybrid approach for automatic data clustering. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 64–78. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_5

    Chapter  Google Scholar 

  34. Pacifico, L.D., Ludermir, T.B.: An evaluation of k-means as a local search operator in hybrid memetic group search optimization for data clustering. Nat. Comput. 20(3), 611–636 (2021)

    Article  MathSciNet  Google Scholar 

  35. Preetha, V.: Data analysis on student’s performance based on health status using genetic algorithm and clustering algorithms. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 836–842. IEEE (2021)

    Google Scholar 

  36. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  37. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  38. Storn, R., Price, K.: Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous spaces. International Computer Science Institute, Berkeley. Tech. Rep., CA, 1995, Tech. Rep. TR-95-012 (1995)

    Google Scholar 

  39. Tam, H.H., Ng, S.C., Lui, A.K., Leung, M.F.: Improved activation schema on automatic clustering using differential evolution algorithm. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 1749–1756. IEEE (2017)

    Google Scholar 

  40. Vali, M., Zare, M., Razavi, S.: Automatic clustering-based surrogate-assisted genetic algorithm for groundwater remediation system design. J. Hydrol. 598, 125752 (2021)

    Article  Google Scholar 

  41. Ye, L., Zheng, D.: Stable grasping control of robot based on particle swarm optimization. In: 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 1020–1024. IEEE (2021)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank FACEPE, CNPq and CAPES (Brazilian Research Agencies) for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luciano D. S. Pacifico .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pacifico, L.D.S., Ludermir, T.B. (2022). Improving Group Search Optimization for Automatic Data Clustering Using Merge and Split Operators. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21686-2_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21685-5

  • Online ISBN: 978-3-031-21686-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics