Abstract
Data and text clustering are popular and frequently used in the data mining domain, mainly to deal with big data analytics. The main problem in these techniques is finding the most coherent clusters allocating similar-related objects into one group. In this paper, an improved clustering analysis approach is proposed using an advanced optimization method called AOAOA. The proposed AOAOA method improved the Aquila optimizer (AO) search performance by the operators of the arithmetic optimization algorithms (AOA) and differential evolution (DE) and using a novel transition mechanism. The primary motivation for this modification is that the original optimizer suffers from local optima stagnation and lacks search balance. Thus, the proposed AOAOA overcame these shortcomings by integrating various powerful search strategies and a new update strategy. Experiments are conducted on two parts; eight standard data clustering datasets and ten text documents benchmark datasets to evaluate the performance of the proposed AOAOA method. The proposed method is compared against several well-known optimization algorithms and advanced state-of-the-art methods published in the literature. The data clustering results also showed promising performance for the proposed AOAOA compared to other comparative data clustering methods. Moreover, the results illustrated that the proposed AOAOA can find new best solutions for several different complicated cases as the text document clustering results. The proposed AOAOA got accurate and robust results compared to several state-of-the-art methods.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chuang L-Y, Hsiao C-J, Yang C-H (2011) Chaotic particle swarm optimization for data clustering. Expert Syst Appl 38(12):14555–14563
Gandomi AH, Chen F, Abualigah L (2022) Machine learning technologies for big data analytics. Electronics 11(3):421
Paul D, Saha S, Kumar A et al (2021) Evolutionary multi-objective optimization based overlapping subspace clustering. Pattern Recogn Lett 145:208–215
Saini N, Saha S, Jangra A, Bhattacharyya P (2019) Extractive single document summarization using multi-objective optimization: exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowl-Based Syst 164:45–67
Song W, Qiao Y, Park SC, Qian X (2015) A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl 42(5):2517–2524
Hassani H, Beneki C, Unger S, Mazinani MT, Yeganegi MR (2020) Text mining in big data analytics. Big Data Cogn Comput 4(1):1
Chen J, Gong Z, Liu W (2020) A Dirichlet process biterm-based mixture model for short text stream clustering. Appl Intell 50(5):1609–1619
Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36
Zhao L, Zhao T, Sun T, Liu Z, Chen Z (2020) Multi-view robust feature learning for data clustering. IEEE Signal Process Lett 27:1750–1754
Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435
Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54:1201–1236
Abualigah L, Gandomi AH, Elaziz MA, Hamad HA, Omari M, Alshinwan M, Khasawneh AM (2021) Advances in meta-heuristic optimization algorithms in big data text clustering. Electronics 10(2):101
Abualigah L, Diabat A, Geem ZW (2020) A comprehensive survey of the harmony search algorithm in clustering applications. Appl Sci 10(11):3827
Selvaraj S, Choi E (2021) Swarm intelligence algorithms in text document clustering with various benchmarks. Sensors 21(9):3196
Wu D, Yang R, Shen C (2021) Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm. J Intell Inf Syst 56:1–23
Oyelade ON, Ezugwu AE, Mohamed TI, Abualigah L (2022) Ebola optimization search algorithm: a new nature-inspired metaheuristic algorithm. IEEE Access 10:16150–16177
Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391:114570
Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data clustering using nature-inspired symbiotic organism search algorithm. Knowl-Based Syst 163:546–557
Thirumoorthy K, Muneeswaran K (2021) A hybrid approach for text document clustering using Jaya optimization algorithm. Expert Syst Appl 178:115040
Purushothaman R, Rajagopalan S, Dhandapani G (2020) Hybridizing gray wolf optimization (GWO) with grasshopper optimization algorithm (GOA) for text feature selection and clustering. Appl Soft Comput 96:106651
Rahnema N, Gharehchopogh FS (2020) An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering. Multimed Tools Appl 79(43):32169–32194
Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Bharti KK, Singh PK (2016) Chaotic gradient artificial bee colony for text clustering. Soft Comput 20(3):1113–1126
Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404
Janani R, Vijayarani S (2019) Text document clustering using spectral clustering algorithm with particle swarm optimization. Expert Syst Appl 134:192–200
Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291
Forsati R, Keikha A, Shamsfard M (2015) An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 159:9–26
Basu T, Murthy C (2015) A similarity assessment technique for effective grouping of documents. Inf Sci 311:149–162
Ding C, Utiyama M, Sumita E (2018) NOVA: A feasible and flexible annotation system for joint tokenization and part-of-speech tagging. ACM Trans Asian Low-Resour Lang Inf Proces 18(2):1–18
Sangaiah AK, Fakhry AE, Abdel-Basset M, El-henawy I (2019) Arabic text clustering using improved clustering algorithms with dimensionality reduction. Clust Comput 22(2):4535–4549
Willett P The porter stemming algorithm: then and now. Program. https://www.emerald.com/insight/content/doi/10.1108/00330330610681295/full/html?casa_token=K6S89sCwui4AAAAA:vEJfHGxrrgOeSukYuqYiQTbnwJK51ZRxrOsuiQDfBgo3XUyY6VuwIuT3_aT_3Fb9J-42JoGiiYUOkZbdF3P7zIZh6xCtjJutRsVwr36G2-V-u3CRboE
Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Bafna P, Pramod D, Vaidya A (2016) Document clustering: TF-IDF approach. In: 2016 International conference on electrical, electronics, and optimization techniques (ICEEOT). IEEE, pp 61–66
Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184
Abualigah L, Yousri D, Abd Elaziz M, Ewees AA, Al-qaness MA, Gandomi AH (2021) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comput Ind Eng 157:107250
Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Mirjalili S (2016) SCA: a sine cosine algorithm for solving optimization problems. Knowl-Based Syst 96:120–133
Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053–1073
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95—international conference on neural networks, vol 4. IEEE, pp 1942–1948
Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98
Faramarzi A, Heidarinejad M, Mirjalili S, Gandomi AH (2020) Marine predators algorithm: a nature-inspired metaheuristic. Expert Syst Appl 152:113377
Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl-Based Syst 191:105190
Abd Elaziz M, Mirjalili S (2019) A hyper-heuristic for improving the initial population of whale optimization algorithm. Knowl-Based Syst 172:42–63
Jouhari H, Lei D, Al-qaness MAA, Abd Elaziz M, Ewees AA, Farouk O (2019) Sine–cosine algorithm to enhance simulated annealing for unrelated parallel machine scheduling with setup times. Mathematics 7(11):1120
Abualigah L, Diabat A, Sumari P, Gandomi AH (2021) A novel evolutionary arithmetic optimization algorithm for multilevel thresholding segmentation of Covid-19 CT images. Processes 9(7):1155
Bouyer A, Hatamlou A (2018) An efficient hybrid clustering method based on improved cuckoo optimization and modified particle swarm optimization algorithms. Appl Soft Comput 67:172–182
Tan Y, Tan G-Z, Deng S-G (2014) Hybrid particle swarm optimization with chaotic search for solving integer and mixed integer programming problems. J Cent South Univ 21(7):2731–2742
Zhou Y, Zhou Y, Luo Q, Abdel-Basset M (2017) A simplex method-based social spider optimization algorithm for clustering analysis. Eng Appl Artif Intell 64:67–82
Boushaki SI, Kamel N, Bendjeghaba O (2018) A new quantum chaotic cuckoo search algorithm for data clustering. Expert Syst Appl 96:358–372
Kartous W, Layeb A, Chikhi S (2014) A new quantum cuckoo search algorithm for multiple sequence alignment. J Intell Syst 23(3):261–275
Bouyer A, Ghafarzadeh H, Tarkhaneh O (2015) An efficient hybrid algorithm using cuckoo search and differential evolution for data clustering. Indian J Sci Technol 8(24):1–12
Jadhav AN, Gomathi N (2018) WGC: hybridization of exponential grey wolf optimizer with whale optimization for data clustering. Alex Eng J 57(3):1569–1584
Acknowledgements
The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: (22UQU4320277DSR04).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abualigah, L., Almotairi, K.H. Dynamic evolutionary data and text document clustering approach using improved Aquila optimizer based arithmetic optimization algorithm and differential evolution. Neural Comput & Applic 34, 20939–20971 (2022). https://doi.org/10.1007/s00521-022-07571-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07571-0