Abstract
In the technological era, exponential increase of unorganized text documents offers increased difficulties retrieving the most relevant data. The document clustering is a most prominent technique that transforms unorganized contents into organized contents in the form of clusters. The recognition technique always undergoes clustering of text documents with misleading or redundant information that degrades document clustering quality. In this study, a salp swarm algorithm (SSA) is used for clustering the text documents. The study is improved with a similarity and a distance-based measurements as an objective function in the clustering domain. The experimental validation is conducted to show the efficacy of SSA-based similarity distance measurement that prominently improves the quality of clustering the text documents. The comparison with existing methods shows that the proposed SSA offers better clustering of text documents in accuracy, sensitivity, specificity, and f-measure.

Similar content being viewed by others
References
Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:1–23
Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19
Singh VK, Tiwari N, Garg S (2011, October) Document clustering using k-means, heuristic k-means and fuzzy c-means. In: 2011 International Conference on Computational Intelligence and Communication Networks. IEEE, pp 297–301
Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, Boston, pp 77–128
Zaw MM, Mon EE (2015) Web document clustering by using PSO-based cuckoo search clustering algorithm. In: Yang X-S (ed) Recent advances in swarm intelligence and evolutionary computation. Springer International Publishing, Cham, pp 263–281
Premalatha K, Natarajan AM (2010) Hybrid PSO and GA models for document clustering. Int J Adv Soft Comput Appl 2(3):302–320
Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016, May) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE). IEEE, pp 67–72
Solihin MI, Chuan CY, Astuti W (2020) Optimization of fuzzy logic controller parameters using modern meta-heuristic algorithm for gantry crane system (GCS). Mater Today Proc 29:168–172
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18
Abualigah LM, Khader AT, Hanandeh ES (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466
Shehab M, Khader AT, Al-Betar MA, Abualigah LM (2017, May) Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In: 2017 8th International Conference on Information Technology, ICIT. IEEE, pp 36–43
Alomari OA, Khader AT, Al-Betar MA, Abualigah LM (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 19(1):32–51
Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34
Alyasseri ZAA, Khader AT, Al-Betar MA, Abualigah LM (2017, May) ECG signal denoising using β-hill climbing algorithm and wavelet transform. In: 2017 8th International Conference on Information Technology (ICIT). IEEE, pp 96–101
Alomari OA, Khader AT, Mohammed AAB, Abualigah LM, Nugroho H, Chandra GR et al (2017) MRMR BA: a hybrid gene selection algorithm for cancer classification. J Theor Appl Inf Technol 95(12):2610–2618
Jaganathan P, Jaiganesh S (2013, December) An improved k-means algorithm combined with particle swarm optimization approach for efficient web document clustering. In: 2013 International Conference on Green Computing, Communication and Conservation of Energy, CGCE. IEEE, pp 772–776
Adeyanju OM, Canha LN (2021) Decentralized multi-area multi-agent economic dispatch model using select meta-heuristic optimization algorithms. Electric Power Syst Res 195:107128
Dhiman G (2021) SSC: a hybrid nature-inspired meta-heuristic optimization algorithm for engineering applications. Knowl Based Syst 222:106926
Moayedikia A, Jensen R, Wiil UK, Forsati R (2015) Weighted bee colony algorithm for discrete optimization problems with application to feature selection. Eng Appl Artif Intell 44:153–167
Song W, Qiao Y, Park SC, Qian X (2015) A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl 42(5):2517–2524
Wang GG, Gandomi AH, Alavi AH, Deb S (2016) A hybrid method based on krill herd and quantum-behaved particle swarm optimization. Neural Comput Appl 27(4):989–1006
Wang GG, Gandomi AH, Alavi AH, Hao GS (2014) Hybrid krill herd algorithm with differential evolution for global numerical optimization. Neural Comput Appl 25(2):297–308
Wang G, Guo L, Wang H, Duan H, Liu L, Li J (2014) Incorporating mutation scheme into krill herd algorithm for global numerical optimization. Neural Comput Appl 24(3–4):853–871
Wang J, Yuan W, Cheng D (2015) Hybrid genetic–particle swarm algorithm: an efficient method for fast optimization of atomic clusters. Comput Theor Chem 1059:12–17
Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Bolaji ALA, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Appl Soft Comput 49:437–446
Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES (2017) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9:11
Bharti KK, Singh PK (2016) Chaotic gradient artificial bee colony for text clustering. Soft Comput 20(3):1113–1126
Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inform Sci 220:269–291
Forsati R, Keikha A, Shamsfard M (2015) An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 159:9–26
Swathine K, Sumathi N (2021) An adaptive optimization based meta-heuristic approach for tracing software requirements. Mater Today Proc
Funding
No funding is involved in this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare there is no conflict of interest.
Ethics approval and consent to participate
No participation of humans takes place in this implementation process.
Human and animal rights
No violation of human and animal rights is involved.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ponnusamy, M., Bedi, P., Suresh, T. et al. Design and analysis of text document clustering using salp swarm algorithm. J Supercomput 78, 16197–16213 (2022). https://doi.org/10.1007/s11227-022-04525-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04525-0