Abstract
The amount of digital data produced daily has increased considerably in the last years. The need for fast and reliable information in real-world applications demands ever more precise algorithms and Data Mining tools, once most of the systems in our daily lives are executed in real-time. Data clustering is one of the most important and primitive activities in Unsupervised Machine Learning, consisting in a fundamental mechanism for exploratory data analysis. Given the complexity of data clustering task, standard clustering methods, such as the partitional algorithms, are easily trapped in local optima solutions, due to their lack of good global searching operators. In this work, three improved Group Search Optimization-based approaches are proposed, based on merge and split heuristics, in the context of Automatic Clustering Analysis: MGSO, SGSO and MSGSO. Group Search Optimization (GSO) is a natural-inspired meta-heuristic, known for its good global search abilities, and mechanisms to escape from local optima points from the problem space. The proposed models attempt to perform both cluster optimization and the determination of the best number of clusters for each dataset, overcoming the limitations of traditional partitional clustering algorithms. The proposed GSO-based models are evaluated through a testing bed composed of nine real-world problems, and compared to six state-of-the-art partitional automatic clustering approaches, include standard GSO. The experimental evaluation has been performed considering five clustering metrics, and both empirical and statistical analysis. The results showed that the proposed MGSO, SGSO and MSGSO algorithms are very promising and reliable while tackling clustering problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2010 Second International Conference on Machine Learning and Computing, pp. 71–75. IEEE (2010)
Ahmadyfard, A., Modares, H.: Combining PSO and k-means to enhance data clustering. In: International Symposium on Telecommunications, IST 2008, pp. 688–691. IEEE (2008)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Barnard, C., Sibly, R.: Producers and scroungers: a general model and its application to captive flocks of house sparrows. Anim. Behav. 29(2), 543–550 (1981)
Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm intelligence: from natural to artificial systems, vol. 4. Oxford University Press, New York (1999)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 3(1), 1–27 (1974)
Civicioglu, P.: Backtracking search optimization algorithm for numerical optimization problems. Appl. Math. Comput. 219(15), 8121–8144 (2013)
Couzin, I.D., Krause, J., Franks, N.R., Levin, S.A.: Effective leadership and decision-making in animal groups on the move. Nature 433(7025), 513–516 (2005)
Das, S., Abraham, A., Konar, A.: Automatic clustering using an improved differential evolution algorithm. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Humans 38(1), 218–237 (2007)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (1979)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Dey, A., Dey, S., Bhattacharyya, S., Platos, J., Snasel, V.: Novel quantum inspired approaches for automatic clustering of gray level images using particle swarm optimization, spider monkey optimization and ageist spider monkey optimization algorithms. Appl. Soft Comput. 88, 106040 (2020)
Dixon, A.: An experimental study of the searching behaviour of the predatory coccinellid beetle adalia decempunctata (l.). J. Animal Ecol. 28, 259–281 (1959)
Dorigo, M., Maniezzo, V., Colorni, A.: Ant system: optimization by a colony of cooperating agents. IEEE Transactions Syst. Man Cybern. Part B: Cybern 26(1), 29–41 (1996)
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1973)
Ezugwu, A.E., Shukla, A.K., Agbaje, M.B., Oyelade, O.N., José-García, A., Agushaka, J.O.: Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Comput. Appl. 33(11), 6247–6306 (2021)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: part I. ACM SIGMOD Rec. 31(2), 40–45 (2002)
He, S., Wu, Q.H., Saunders, J.R.: Group search optimizer: an optimization algorithm inspired by animal searching behavior. IEEE Trans. Evol. Comput. 13(5), 973–990 (2009)
Higgins, C.L., Strauss, R.E.: Discrimination and classification of foraging paths produced by search-tactic models. Behav. Ecol. 15(2), 248–254 (2004)
Holland, J.H.: Genetic algorithms. Scientific Am. 267(1), 66–72 (1992)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Ikotun, A.M., Almutari, M.S., Ezugwu, A.E.: K-means-based nature-inspired metaheuristic algorithms for automatic data clustering problems: recent advances and future directions. Appl. Sci. 11(23), 11246 (2021)
Jin, Y.F., Yin, Z.Y.: Enhancement of backtracking search algorithm for identifying soil parameters. Int. J. Numer. Anal. Meth. Geomech. 44(9), 1239–1261 (2020)
José-García, A., Gómez-Flores, W.: Automatic clustering using nature-inspired metaheuristics: a survey. Appl. Soft Comput. 41, 192–213 (2016)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Latiff, N.A., Malik, N.N.A., Idoumghar, L.: Hybrid backtracking search optimization algorithm and k-means for clustering in wireless sensor networks. In: 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure Computing, 14th Intl Conference on Pervasive Intelligence and Computing, 2nd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 558–564. IEEE (2016)
Liu, Y., Wu, X., Shen, Y.: Automatic clustering using genetic algorithms. Appl. Math. Comput. 218(4), 1267–1279 (2011)
Naldi, M.C., Campello, R.J., Hruschka, E.R., Carvalho, A.: Efficiency issues of evolutionary k-means. Appl. Soft Comput. 11(2), 1938–1952 (2011)
Nemenyi, P.B.: Distribution-free multiple comparisons. Princeton University (1962)
Omran, M., Salman, A., Engelbrecht, A.: Dynamic clustering using particle swarm optimization with application in unsupervised image classification. In: Fifth World Enformatika Conference (ICCI 2005), Prague, Czech Republic, pp. 199–204 (2005)
Pacífico, L.: Agrupamento de imagens baseado em uma abordagem híbrida entre a otimização por busca em grupo e k-means para a segmentação automática de doenças em plantas. In: Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional, pp. 152–163. SBC (2020)
Pacifico, L., Ludermir, T.: Backtracking group search optimization: a hybrid approach for automatic data clustering. In: Cerri, R., Prati, R.C. (eds.) BRACIS 2020. LNCS (LNAI), vol. 12319, pp. 64–78. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61377-8_5
Pacifico, L.D., Ludermir, T.B.: An evaluation of k-means as a local search operator in hybrid memetic group search optimization for data clustering. Nat. Comput. 20(3), 611–636 (2021)
Preetha, V.: Data analysis on student’s performance based on health status using genetic algorithm and clustering algorithms. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 836–842. IEEE (2021)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Storn, R., Price, K.: Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous spaces. International Computer Science Institute, Berkeley. Tech. Rep., CA, 1995, Tech. Rep. TR-95-012 (1995)
Tam, H.H., Ng, S.C., Lui, A.K., Leung, M.F.: Improved activation schema on automatic clustering using differential evolution algorithm. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 1749–1756. IEEE (2017)
Vali, M., Zare, M., Razavi, S.: Automatic clustering-based surrogate-assisted genetic algorithm for groundwater remediation system design. J. Hydrol. 598, 125752 (2021)
Ye, L., Zheng, D.: Stable grasping control of robot based on particle swarm optimization. In: 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 1020–1024. IEEE (2021)
Acknowledgements
The authors would like to thank FACEPE, CNPq and CAPES (Brazilian Research Agencies) for their financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pacifico, L.D.S., Ludermir, T.B. (2022). Improving Group Search Optimization for Automatic Data Clustering Using Merge and Split Operators. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-21686-2_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21685-5
Online ISBN: 978-3-031-21686-2
eBook Packages: Computer ScienceComputer Science (R0)