Skip to main content
Log in

Hybrid clustering analysis using improved krill herd algorithm

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, a novel text clustering method, improved krill herd algorithm with a hybrid function, called MMKHA, is proposed as an efficient clustering way to obtain promising and precise results in this domain. Krill herd is a new swarm-based optimization algorithm that imitates the behavior of a group of live krill. The potential of this algorithm is high because it performs better than other optimization methods; it balances the process of exploration and exploitation by complementing the strength of local nearby searching and global wide-range searching. Text clustering is the process of grouping significant amounts of text documents into coherent clusters in which documents in the same cluster are relevant. For the purpose of the experiments, six versions are thoroughly investigated to determine the best version for solving the text clustering. Eight benchmark text datasets are used for the evaluation process available at the Laboratory of Computational Intelligence (LABIC). Seven evaluation measures are utilized to validate the proposed algorithms, namely, ASDC, accuracy, precision, recall, F-measure, purity, and entropy. The proposed algorithms are compared with the other successful algorithms published in the literature. The results proved that the proposed improved krill herd algorithm with hybrid function achieved almost all the best results for all datasets in comparison with the other comparative algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://sites.labic.icmc.usp.br/text_collections/

References

  1. Mamat R, Herawan T, Deris MM (2013) MAR: maximum attribute relative of soft set for clustering attribute selection. Knowl-Based Syst 52:11–20

    Article  Google Scholar 

  2. Kang J, Zhang W (2012) Combination of fuzzy C-means and particle swarm optimization for text document clustering. In: Advances in electrical engineering and automation. Springer, pp 247–252

  3. Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Systems with Applications

  4. Shokouhifar M, Jalali A (2017) Optimized sugeno fuzzy clustering algorithm for wireless sensor networks. Eng Appl Artif Intel 60:16–25

    Article  Google Scholar 

  5. Al-Sai ZA, Abualigah LM (2017) Big data and E-government: A review. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 580–587

  6. Cobos C, Muñoz-Collazos H, Urbano-Muñoz R, Mendoza M, León E, Herrera-Viedma E (2014) Clustering of web search results based on the cuckoo search algorithm and balanced Bayesian information criterion. Inf Sci 281:248–264

    Article  Google Scholar 

  7. Wang H, Xu Z, Pedrycz W (2017) An overview on the roles of fuzzy set techniques in big data processing: Trends, challenges and opportunities. Knowl-Based Syst 118:15–30

    Article  Google Scholar 

  8. Song W, Qiao Y, Park S C, Qian X (2015) A hybrid evolutionary computation approach with its application for optimizing text document clustering. Expert Syst Appl 42(5):2517–2524

    Article  Google Scholar 

  9. Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objectives-based text clustering technique using K-mean algorithm. In: 2016 7th international conference on computer science and information technology (CSIT). IEEE, pp 1–6

  10. Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms. In: Mining text data. Springer, pp 77–128

  11. Alomari OA, Khader AT, Mohammed AAB, Abualigah LM, Nugroho H, Chandra GR et al (2017) MRMR BA: A hybrid gene selection algorithm for cancer classification. J Theoretical Appl Inf Techn 95(12):15

    Google Scholar 

  12. Alomari O A, Khader A T, Al-Betar M A, Abualigah L M (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 19 (1):32–51

    Article  Google Scholar 

  13. Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th international conference on computer science and information technology (CSIT). IEEE, pp 1–6

  14. Abualigah LM, Khader AT, Hanandeh ES A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering. Intelligent Decision Technologies;(Preprint):1–12

  15. Abualigah LM, Khader AT, Hanandeh ES (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. Journal of Computational Science

  16. Shehab M, Khader AT, Al-Betar MA, Abualigah LM (2017) Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In: 2017 8th international conference on information technology (ICIT). IEEE, pp 36–43

  17. Shelokar P, Jayaraman V K, Kulkarni B D (2004) An ant colony approach for clustering. Analytica Chimica Acta 509(2):187–195

    Article  Google Scholar 

  18. Bharti K K, Singh P K (2016) Chaotic gradient artificial bee colony for text clustering. Soft Comput 20 (3):1113–1126

    Article  Google Scholar 

  19. Gandomi A H, Alavi A H (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845

    Article  MathSciNet  Google Scholar 

  20. Bolaji AL, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill Herd algorithm (KH) and its applications. Applied Soft Computing

  21. Rao AS, Ramakrishna S, Babu PC (2016) MODC: Multi-objective distance based optimal document clustering by GA. Ind J Sci Technol, 9(28)

  22. Abualigah L M, Khader A T, Al-Betar MA (2016) Multi-objectives-based text clustering technique using K-mean algorithm. In: 2016 7th international conference on computer science and information technology (CSIT). IEEE, pp 1–6

  23. Forsati R, Mahdavi M, Shamsfard M, Meybodi M R (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291

    Article  MathSciNet  Google Scholar 

  24. Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello C A C (2014) Survey of multiobjective evolutionary algorithms for data mining: Part II. IEEE Trans Evol Comput 18(1):20–35

    Article  Google Scholar 

  25. Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv 47(4):61

    Article  Google Scholar 

  26. Saha S, Ekbal A, Alok A K, Spandana R (2014) Feature selection and semi-supervised clustering using multiobjective optimization. SpringerPlus 3(1):465

    Article  Google Scholar 

  27. George G, Parthiban L (2015) Multi objective hybridized firefly algorithm with group search optimization for data clustering. In: 2015 IEEE international conference on research in computational intelligence and communication networks (ICRCICN). IEEE, pp 125–130

  28. Liu F, Xiong L (2011) Survey on text clustering algorithm. In: 2011 IEEE 2nd international conference on software engineering and service science. IEEE, pp 901–904

  29. Nanda S J, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evol Comput 16:1–18

    Article  Google Scholar 

  30. MacQueen J et al. (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, pp 281–297

  31. Abualigah LM, Khader AT, Al-Betar MA, Alyasseri ZAA, Alomari OA, Hanandeh ES (2017) Feature selection with β-hill climbing search for text clustering application. In: 2017 Palestinian international conference on information and communication technology (PICICT). IEEE, pp 22–27

  32. Alghamdi HM, Selamat A, Karim NSA (2014) Improved text clustering using k-mean Bayesian Vectoriser. J Inf Knowl Manag 13(03):1450026

    Article  Google Scholar 

  33. Jensi R, Jiji G W (2016) An improved krill herd algorithm with global exploration capability for solving numerical function optimization problems and its application to data clustering. Appl Soft Comput 46:230–245

    Article  Google Scholar 

  34. Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017) Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. EAI Google Scholar

  35. Wu G, Lin H, Fu E, Wang L (2015) An improved K-means algorithm for document clustering. In: 2015 international conference on computer science and mechanical automation (CSMA), pp 65–69

  36. Roul RK, Varshneya S, Kalra A, Sahay SK (2015) A novel modified apriori approach for web document clustering. In: Computational intelligence in data mining-volume 3. Springer, pp 159–171

  37. Song W, Li C H, Park S C (2009) Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst Appl 36(5):9095–9104

    Article  Google Scholar 

  38. Akter R, Chung Y (2013) An evolutionary approach for document clustering. IERI Procedia 4:370–375

    Article  Google Scholar 

  39. Karaa WBA, Ashour AS, Sassi DB, Roy P, Kausar N, Dey N (2016) MEDLINE text mining: An enhancement genetic algorithm based approach for document clustering. In: Applications of intelligent optimization in biology and medicine. Springer, pp 267–287

  40. Moh’d Alia O, Al-Betar MA, Mandava R, Khader AT (2011) Data clustering using harmony search algorithm. In: International conference on swarm, evolutionary, and memetic computing. Springer, pp 79–88

  41. Devi SS, Shanmugam A, Prabha ED (2015) A proficient method for text clustering using harmony search method

  42. Cui X, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings 2005 IEEE swarm intelligence symposium, 2005. SIS 2005. IEEE, pp 185–191

  43. Song W, Ma W, Qiao Y (2014) Particle swarm optimization algorithm with environmental factors for clustering analysis. Soft Comput 21:1–11

    Article  Google Scholar 

  44. Armano G, Farmani M R (2016) Multiobjective clustering analysis using particle swarm optimization. Expert Syst Appl 55:184–193

    Article  Google Scholar 

  45. Manikandan P, Selvarajan S (2014) Data clustering using cuckoo search algorithm (CSA). In: Proceedings of the second international conference on soft computing for problem solving (SocProS 2012), December 28-30, 2012. Springer, pp 1275–1283

  46. Zaw MM, Mon EE (2015) Web document clustering by using PSO-based cuckoo search clustering algorithm. In: Recent advances in swarm intelligence and evolutionary computation. Springer, pp 263–281

  47. Amiri E, Mahmoudi S (2016) Efficient protocol for data clustering by fuzzy cuckoo optimization algorithm. Appl Soft Comput 41:15–21

    Article  Google Scholar 

  48. Saida IB, Nadjet K, Omar B (2014) A new algorithm for data clustering based on cuckoo search optimization. In: Genetic and evolutionary computing. Springer, pp 55–64

  49. Machnik Ł (2007) A document clustering method based on ant algorithms. Task Quarterly 11(1-2):87–102

    Google Scholar 

  50. Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1(2):95–113

    Article  Google Scholar 

  51. Rajeswari MR, GunaSekaran G (2015) Improved ant colony optimization towards robust ensemble co-clustering algorithm (IACO-RECCA) for enzyme clustering. Lateral, 4(4)

  52. Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2014) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42(1):21–57

    Article  Google Scholar 

  53. Bharti KK, Singh P (2014) Chaotic artificial bee colony for text clustering. In: 2014 4th international conference of emerging applications of information technology (EAIT). IEEE, pp 337–343

  54. Mohammed AJ, Yusof Y, Husni H (2016) GF-CLUST: A nature-inspired algorithm for automatic text clustering. Afr J Inf Commun Technol 15(1):57–81

    Google Scholar 

  55. Song W, Liang J Z, Park S C, Fuzzy control G A (2014) with a novel hybrid semantic similarity strategy for text clustering. Inf Sci 273:156–170

    Article  Google Scholar 

  56. Forsati R, Keikha A, Shamsfard M (2015) An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 159:9–26

    Article  Google Scholar 

  57. Hassanzadeh T, Meybodi MR (2012) A new hybrid approach for data clustering using firefly algorithm and K-means. In: 2012 16th CSI international symposium on artificial intelligence and signal processing (AISP). IEEE, pp 007–011

  58. Abualigah L M, Khader A T, Hanandeh E S, Gandomi A H (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423– 435

    Article  Google Scholar 

  59. Abualigah L M, Khader A T, Al-Betar MA (2016) Unsupervised feature selection technique based on harmony search algorithm for improving the text clustering, pp 1–6

  60. Abualigah L M, Khader A T, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering, pp 1–6

  61. Abualigah L M, Khader A T (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:1–23

    Article  Google Scholar 

  62. Abualigah LM, Khader AT, Hanandeh ES (2018) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, pp 305–320

  63. Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE symposium on computer applications & industrial electronics (ISCAIE). IEEE, pp 67–72

  64. Mirhosseini M (2017) A clustering approach using a combination of gravitational search algorithm and k-harmonic means and its application in text document clustering. Turk J Electr Eng Comput Sci 25(2):1251–1262

    Article  MathSciNet  Google Scholar 

  65. Liu H, Hu Z, Mian A, Tian H, Zhu X (2014) A new user similarity model to improve the accuracy of collaborative filtering. Knowl-Based Syst 56:156–166

    Article  Google Scholar 

  66. Li Y, Luo C, Chung S M (2015) A parallel text document clustering algorithm based on neighbors. Clust Comput 18(2):933–948

    Article  Google Scholar 

  67. Karol S, Mangat V (2013) Evaluation of text document clustering approach based on particle swarm optimization. Open Comput Sci 3(2):69–90

    Article  Google Scholar 

  68. Rao R V, Rai D P, Balic J (2017) A multi-objective algorithm for optimization of modern machining processes. Eng Appl Artif Intel 61:103–125

    Article  Google Scholar 

  69. Mandal B, Roy P K, Mandal S (2014) Economic load dispatch using krill herd algorithm. Int J Electr Power Energy Syst 57:1–10

    Article  Google Scholar 

  70. Abualigah L M, Khader A T, Al-Betar M A, Hanandeh E S (2017) A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. Management 9:11

    Google Scholar 

  71. Li X, Ouyang J, Zhou X, Fu B (2014) Adaptive centroid-based clustering algorithm for text document data. In: 2014 6th international symposium on parallel architectures, algorithms and programming (PAAP). IEEE, pp 63–68

  72. Nesi P, Pantaleo G, Tenti M (2016) Geographical localization of web domains and organization addresses recognition by employing natural language processing, pattern matching and clustering. Eng Appl Artif Intel 51:202–211

    Article  Google Scholar 

  73. Balabantaray RC, Sarma C, Jha M (2015) Document clustering using K-means and K-medoids. arXiv:150207938

  74. Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Applied Soft Computing

  75. Bharti K K, Singh P K (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114

    Article  Google Scholar 

  76. Mohammed A J, Yusof Y, Husni H (2015) Document clustering based on firefly algorithm. J Comput Sci 11(3):453

    Article  Google Scholar 

  77. Zaw M M, Mon E E (2013) Web document clustering using cuckoo search clustering algorithm based on Levy flight. Int J Innov Appl Stud 4(1):182–188

    Google Scholar 

  78. Singh VK, Tiwari N, Garg S (2011) Document clustering using k-means, heuristic k-means and fuzzy c-means. In: 2011 international conference on computational intelligence and communication networks (CICN). IEEE, pp 297–301

  79. Prakash B, Hanumanthappa M, Mamatha M (2014) Cluster based term weighting model for web document clustering. In: Proceedings of the third international conference on soft computing for problem solving. Springer, pp 815–822

  80. Rose J D (2016) An efficient association rule based hierarchical algorithm for text clustering. Int J Adv Engg Tech/Vol VII/Issue I/Jan-March 751:753

    Google Scholar 

  81. Abualigah L M, Sawaie A M, Khader A T, Rashaideh H, Al-Betar M A, Shehab M (2017) β-hill climbing technique for the text document clustering. New Trends in Information Technology, p 60

  82. Kushwaha N, Pant M (2017) Link based BPSO for feature selection in big data text clustering. Future Generation Computer Systems

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Mohammad Abualigah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abualigah, L.M., Khader, A.T. & Hanandeh, E.S. Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48, 4047–4071 (2018). https://doi.org/10.1007/s10489-018-1190-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1190-6

Keywords

Navigation