Abstract
In this paper, a novel text clustering method, improved krill herd algorithm with a hybrid function, called MMKHA, is proposed as an efficient clustering way to obtain promising and precise results in this domain. Krill herd is a new swarm-based optimization algorithm that imitates the behavior of a group of live krill. The potential of this algorithm is high because it performs better than other optimization methods; it balances the process of exploration and exploitation by complementing the strength of local nearby searching and global wide-range searching. Text clustering is the process of grouping significant amounts of text documents into coherent clusters in which documents in the same cluster are relevant. For the purpose of the experiments, six versions are thoroughly investigated to determine the best version for solving the text clustering. Eight benchmark text datasets are used for the evaluation process available at the Laboratory of Computational Intelligence (LABIC). Seven evaluation measures are utilized to validate the proposed algorithms, namely, ASDC, accuracy, precision, recall, F-measure, purity, and entropy. The proposed algorithms are compared with the other successful algorithms published in the literature. The results proved that the proposed improved krill herd algorithm with hybrid function achieved almost all the best results for all datasets in comparison with the other comparative algorithms.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Mamat R, Herawan T, Deris MM (2013) MAR: maximum attribute relative of soft set for clustering attribute selection. Knowl-Based Syst 52:11–20
Kang J, Zhang W (2012) Combination of fuzzy C-means and particle swarm optimization for text document clustering. In: Advances in electrical engineering and automation. Springer, pp 247–252
Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Systems with Applications
Shokouhifar M, Jalali A (2017) Optimized sugeno fuzzy clustering algorithm for wireless sensor networks. Eng Appl Artif Intel 60:16–25
Al-Sai ZA, Abualigah LM (2017) Big data and E-government: A review. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 580–587
Cobos C, Muñoz-Collazos H, Urbano-Muñoz R, Mendoza M, León E, Herrera-Viedma E (2014) Clustering of web search results based on the cuckoo search algorithm and balanced Bayesian information criterion. Inf Sci 281:248–264
Wang H, Xu Z, Pedrycz W (2017) An overview on the roles of fuzzy set techniques in big data processing: Trends, challenges and opportunities. Knowl-Based Syst 118:15–30
Song W, Qiao Y, Park S C, Qian X (2015) A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl 42(5):2517–2524
Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-based text clustering technique using K-mean algorithm. In: 2016 7th international conference on computer science and information technology (CSIT). IEEE, pp 1–6
Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms. In: Mining text data. Springer, pp 77–128
Alomari OA, Khader AT, Mohammed AAB, Abualigah LM, Nugroho H, Chandra GR et al (2017) MRMR BA: A hybrid gene selection algorithm for cancer classification. J Theoretical Appl Inf Techn 95(12):15
Alomari O A, Khader A T, Al-Betar M A, Abualigah L M (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 19 (1):32–51
Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th international conference on computer science and information technology (CSIT). IEEE, pp 1–6
Abualigah LM, Khader AT, Hanandeh ES A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering. Intelligent Decision Technologies;(Preprint):1–12
Abualigah LM, Khader AT, Hanandeh ES (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. Journal of Computational Science
Shehab M, Khader AT, Al-Betar MA, Abualigah LM (2017) Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 36–43
Shelokar P, Jayaraman V K, Kulkarni B D (2004) An ant colony approach for clustering. Analytica Chimica Acta 509(2):187–195
Bharti K K, Singh P K (2016) Chaotic gradient artificial bee colony for text clustering. Soft Comput 20 (3):1113–1126
Gandomi A H, Alavi A H (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845
Bolaji AL, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Applied Soft Computing
Rao AS, Ramakrishna S, Babu PC (2016) MODC: Multi-objective distance based optimal document clustering by GA. Ind J Sci Technol, 9(28)
Abualigah L M, Khader A T, Al-Betar MA (2016) Multi-objectives-based text clustering technique using K-mean algorithm. In: 2016 7th international conference on computer science and information technology (CSIT). IEEE, pp 1–6
Forsati R, Mahdavi M, Shamsfard M, Meybodi M R (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291
Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello C A C (2014) Survey of multiobjective evolutionary algorithms for data mining: Part II. IEEE Trans Evol Comput 18(1):20–35
Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv 47(4):61
Saha S, Ekbal A, Alok A K, Spandana R (2014) Feature selection and semi-supervised clustering using multiobjective optimization. SpringerPlus 3(1):465
George G, Parthiban L (2015) Multi objective hybridized firefly algorithm with group search optimization for data clustering. In: 2015 IEEE international conference on research in computational intelligence and communication networks (ICRCICN). IEEE, pp 125–130
Liu F, Xiong L (2011) Survey on text clustering algorithm. In: 2011 IEEE 2nd international conference on software engineering and service science. IEEE, pp 901–904
Nanda S J, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18
MacQueen J et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, pp 281–297
Abualigah LM, Khader AT, Al-Betar MA, Alyasseri ZAA, Alomari OA, Hanandeh ES (2017) Feature selection with β-hill climbing search for text clustering application. In: 2017 Palestinian international conference on information and communication technology (PICICT). IEEE, pp 22–27
Alghamdi HM, Selamat A, Karim NSA (2014) Improved text clustering using k-mean Bayesian Vectoriser. J Inf Knowl Manag 13(03):1450026
Jensi R, Jiji G W (2016) An improved krill herd algorithm with global exploration capability for solving numerical function optimization problems and its application to data clustering. Appl Soft Comput 46:230–245
Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017) Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. EAI Google Scholar
Wu G, Lin H, Fu E, Wang L (2015) An improved K-means algorithm for document clustering. In: 2015 international conference on computer science and mechanical automation (CSMA), pp 65–69
Roul RK, Varshneya S, Kalra A, Sahay SK (2015) A novel modified apriori approach for web document clustering. In: Computational intelligence in data mining-volume 3. Springer, pp 159–171
Song W, Li C H, Park S C (2009) Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst Appl 36(5):9095–9104
Akter R, Chung Y (2013) An evolutionary approach for document clustering. IERI Procedia 4:370–375
Karaa WBA, Ashour AS, Sassi DB, Roy P, Kausar N, Dey N (2016) MEDLINE text mining: An enhancement genetic algorithm based approach for document clustering. In: Applications of intelligent optimization in biology and medicine. Springer, pp 267–287
Moh’d Alia O, Al-Betar MA, Mandava R, Khader AT (2011) Data clustering using harmony search algorithm. In: International conference on swarm, evolutionary, and memetic computing. Springer, pp 79–88
Devi SS, Shanmugam A, Prabha ED (2015) A proficient method for text clustering using harmony search method
Cui X, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings 2005 IEEE swarm intelligence symposium, 2005. SIS 2005. IEEE, pp 185–191
Song W, Ma W, Qiao Y (2014) Particle swarm optimization algorithm with environmental factors for clustering analysis. Soft Comput 21:1–11
Armano G, Farmani M R (2016) Multiobjective clustering analysis using particle swarm optimization. Expert Syst Appl 55:184–193
Manikandan P, Selvarajan S (2014) Data clustering using cuckoo search algorithm (CSA). In: Proceedings of the second international conference on soft computing for problem solving (SocProS 2012), December 28-30, 2012. Springer, pp 1275–1283
Zaw MM, Mon EE (2015) Web document clustering by using PSO-based cuckoo search clustering algorithm. In: Recent advances in swarm intelligence and evolutionary computation. Springer, pp 263–281
Amiri E, Mahmoudi S (2016) Efficient protocol for data clustering by fuzzy cuckoo optimization algorithm. Appl Soft Comput 41:15–21
Saida IB, Nadjet K, Omar B (2014) A new algorithm for data clustering based on cuckoo search optimization. In: Genetic and evolutionary computing. Springer, pp 55–64
Machnik Ł (2007) A document clustering method based on ant algorithms. Task Quarterly 11(1-2):87–102
Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1(2):95–113
Rajeswari MR, GunaSekaran G (2015) Improved ant colony optimization towards robust ensemble co-clustering algorithm (IACO-RECCA) for enzyme clustering. Lateral, 4(4)
Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2014) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42(1):21–57
Bharti KK, Singh P (2014) Chaotic artificial bee colony for text clustering. In: 2014 4th international conference of emerging applications of information technology (EAIT). IEEE, pp 337–343
Mohammed AJ, Yusof Y, Husni H (2016) GF-CLUST: A nature-inspired algorithm for automatic text clustering. Afr J Inf Commun Technol 15(1):57–81
Song W, Liang J Z, Park S C, Fuzzy control G A (2014) with a novel hybrid semantic similarity strategy for text clustering. Inf Sci 273:156–170
Forsati R, Keikha A, Shamsfard M (2015) An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 159:9–26
Hassanzadeh T, Meybodi MR (2012) A new hybrid approach for data clustering using firefly algorithm and K-means. In: 2012 16th CSI international symposium on artificial intelligence and signal processing (AISP). IEEE, pp 007–011
Abualigah L M, Khader A T, Hanandeh E S, Gandomi A H (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423– 435
Abualigah L M, Khader A T, Al-Betar MA (2016) Unsupervised feature selection technique based on harmony search algorithm for improving the text clustering, pp 1–6
Abualigah L M, Khader A T, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering, pp 1–6
Abualigah L M, Khader A T (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:1–23
Abualigah LM, Khader AT, Hanandeh ES (2018) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, pp 305–320
Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE symposium on computer applications & industrial electronics (ISCAIE). IEEE, pp 67–72
Mirhosseini M (2017) A clustering approach using a combination of gravitational search algorithm and k-harmonic means and its application in text document clustering. Turk J Electr Eng Comput Sci 25(2):1251–1262
Liu H, Hu Z, Mian A, Tian H, Zhu X (2014) A new user similarity model to improve the accuracy of collaborative filtering. Knowl-Based Syst 56:156–166
Li Y, Luo C, Chung S M (2015) A parallel text document clustering algorithm based on neighbors. Clust Comput 18(2):933–948
Karol S, Mangat V (2013) Evaluation of text document clustering approach based on particle swarm optimization. Open Comput Sci 3(2):69–90
Rao R V, Rai D P, Balic J (2017) A multi-objective algorithm for optimization of modern machining processes. Eng Appl Artif Intel 61:103–125
Mandal B, Roy P K, Mandal S (2014) Economic load dispatch using krill herd algorithm. Int J Electr Power Energy Syst 57:1–10
Abualigah L M, Khader A T, Al-Betar M A, Hanandeh E S (2017) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9:11
Li X, Ouyang J, Zhou X, Fu B (2014) Adaptive centroid-based clustering algorithm for text document data. In: 2014 6th international symposium on parallel architectures, algorithms and programming (PAAP). IEEE, pp 63–68
Nesi P, Pantaleo G, Tenti M (2016) Geographical localization of web domains and organization addresses recognition by employing natural language processing, pattern matching and clustering. Eng Appl Artif Intel 51:202–211
Balabantaray RC, Sarma C, Jha M (2015) Document clustering using K-means and K-medoids. arXiv:150207938
Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Applied Soft Computing
Bharti K K, Singh P K (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114
Mohammed A J, Yusof Y, Husni H (2015) Document clustering based on firefly algorithm. J Comput Sci 11(3):453
Zaw M M, Mon E E (2013) Web document clustering using cuckoo search clustering algorithm based on Levy flight. Int J Innov Appl Stud 4(1):182–188
Singh VK, Tiwari N, Garg S (2011) Document clustering using k-means, heuristic k-means and fuzzy c-means. In: 2011 international conference on computational intelligence and communication networks (CICN). IEEE, pp 297–301
Prakash B, Hanumanthappa M, Mamatha M (2014) Cluster based term weighting model for web document clustering. In: Proceedings of the third international conference on soft computing for problem solving. Springer, pp 815–822
Rose J D (2016) An efficient association rule based hierarchical algorithm for text clustering. Int J Adv Engg Tech/Vol VII/Issue I/Jan-March 751:753
Abualigah L M, Sawaie A M, Khader A T, Rashaideh H, Al-Betar M A, Shehab M (2017) β-hill climbing technique for the text document clustering. New Trends in Information Technology, p 60
Kushwaha N, Pant M (2017) Link based BPSO for feature selection in big data text clustering. Future Generation Computer Systems
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abualigah, L.M., Khader, A.T. & Hanandeh, E.S. Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48, 4047–4071 (2018). https://doi.org/10.1007/s10489-018-1190-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1190-6