Skip to main content
Log in

A novel hybrid multi-verse optimizer with K-means for text documents clustering

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Text clustering has been widely utilized with the aim of partitioning specific document collection into different subsets using homogeneity/heterogeneity criteria. It has also become a very complicated area of research, including pattern recognition, information retrieval, and text mining. Metaheuristics are typically used as efficient approaches for the text clustering problem. The multi-verse optimizer algorithm (MVO) involves a stochastic population-based algorithm. It has been recently proposed and successfully utilized to tackle many hard optimization problems. However, a recently applied research trend involves hybridizing two or more algorithms with the aim of obtaining a superior solution regarding the problems of optimization. In this paper, a new hybrid of MVO algorithm with the K-means clustering algorithm is proposed, i.e., the H-MVO algorithm with the aims of enhancing the quality of initial candidate solutions, as well as the best solution, which is produced by MVO at each iteration. This hybrid algorithm aims at improving the global (diversification) ability of the search and finding a better cluster partition. The proposed H-MVO effectiveness was tested on five standard datasets, which are used in the domain of data clustering, as well as six standard text datasets, which are utilized in the domain of text document clustering, in addition to two scientific articles’ datasets. The experiments showed that K-means hybridized MVO improves the results in terms of high convergence rate, accuracy, error rate, purity, entropy, recall, precision, and F-measure criteria. In general, H-MVO has outperformed or at least proven to be highly competitive compared to the original MVO algorithm and with well-known optimization algorithms like KHA, HS, PSO, GA, H-PSO, and H-GA and the clustering techniques like K-mean, K-mean++, DBSCAN, agglomerative, and spectral clustering techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/index.php/.

  2. https://www.kaggle.com/ammarabbasi/20newsgroups-300-articles.

  3. http://archive.ics.uci.edu/ml/index.php/.

  4. http://archive.ics.uci.edu/ml/index.php/.

  5. https://www.kaggle.com/benhamner/exploring-the-nips-2015-papers/data.

  6. https://www.archive.ics.uci.edu/ml.

References

  1. Figueiredo E, Macedo M, Siqueira HV, Santana CJ Jr, Gokhale A, Bastos-Filho CJA (2019) Swarm intelligence for clustering a systematic review with new perspectives on data mining. Eng Appl Artif Intell 82:313–329

    Google Scholar 

  2. Bharti KK, Singh PK (2016) Chaotic gradient artificial bee colony for text clustering. Soft Comput 20(3):1113–1126

    Google Scholar 

  3. Jensi R, Jiji DGW (2014) A survey on optimization approaches to text document clustering. arXiv preprint arXiv:1401.2229

  4. Kalogeratos A, Likas A (2012) Text document clustering using global term context vectors. Knowl Inf Syst 31(3):455–474

    Google Scholar 

  5. Abasi AK, Khader AT, Al-Betar MA, Naim S, Makhadmeh SN, Alyasseri ZAA (2020) Link-based multi-verse optimizer for text documents clustering. Appl Soft Comput 87:106002

    Google Scholar 

  6. Rani MS, Babu GC (2019) Efficient query clustering technique and context well-informed document clustering. In: Wang J, Reddy G, Prasad V, Reddy V (eds) Soft computing and signal processing. Springer, Singapore, pp 261–271

    Google Scholar 

  7. Kumar Y, Sahoo G (2015) A hybrid data clustering approach based on improved cat swarm optimization and k-harmonic mean algorithm. AI Commun 28(4):751–764

    MathSciNet  Google Scholar 

  8. Büyüksaatçı S, Baray A (2016) A brief review of metaheuristics for document or text clustering. In: Intelligent techniques for data analysis in diverse settings. IGI Global, pp 252–264

  9. Niknam T, Amiri B (2010) An efficient hybrid approach based on pso, aco and k-means for cluster analysis. Appl Soft Comput 10(1):183–197

    Google Scholar 

  10. Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435

    Google Scholar 

  11. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    MATH  Google Scholar 

  12. Abasi AK, Khader AT, Al-Betar MA, Naim S, Makhadmeh SN, Alyasseri ZAA (2019) A text feature selection technique based on binary multi-verse optimizer for text clustering. In: 2019 IEEE Jordan international joint conference on electrical engineering and information technology (JEEIT). IEEE, pp 1–6

  13. Sahoo G et al (2017) A two-step artificial bee colony algorithm for clustering. Neural Comput Appl 28(3):537–551

    Google Scholar 

  14. Alyasseri ZAA, Khadeer AT, Al-Betar MA, Abasi A, Makhadmeh S, Ali NS (2019) The effects of EEG feature extraction using multi-wavelet decomposition for mental tasks classification. In: Proceedings of the international conference on information and communication technology. ACM, pp 139–146

  15. Kaveh A, Khayatazad M (2012) A new meta-heuristic method: ray optimization. Comput Struct 112:283–294

    Google Scholar 

  16. Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inf Sci 220:269–291

    MathSciNet  Google Scholar 

  17. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Google Scholar 

  18. Zaw MM, Mon EE (2015) Web document clustering by using PSO-based cuckoo search clustering algorithm. In: Yang XS (ed) Recent advances in swarm intelligence and evolutionary computation. Springer, Cham, pp 263–281

    Google Scholar 

  19. Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191

    Google Scholar 

  20. Pan W-T (2012) A new fruit fly optimization algorithm: taking the financial distress model as an example. Knowl Based Syst 26:69–74

    Google Scholar 

  21. Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053–1073

    MathSciNet  Google Scholar 

  22. Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Commun Nonlinear Sci Numer Simul 17(12):4831–4845

    MathSciNet  MATH  Google Scholar 

  23. Degertekin SO, Hayalioglu MS (2013) Sizing truss structures using teaching–learning-based optimization. Comput Struct 119:177–188

    Google Scholar 

  24. Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98

    Google Scholar 

  25. Kaveh A, Farhoudi N (2013) A new optimization method: dolphin echolocation. Adv Eng Softw 59:53–70

    Google Scholar 

  26. Cura T (2012) A particle swarm optimization approach to clustering. Expert Syst Appl 39(1):1582–1588

    Google Scholar 

  27. Shelokar PS, Jayaraman VK, Kulkarni BD (2004) An ant colony approach for clustering. Anal Chim Acta 509(2):187–195

    Google Scholar 

  28. Barrow JD, Davies PCW, Harper CL Jr (2004) Science and ultimate reality: quantum theory, cosmology, and complexity. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  29. Mirjalili S, Mirjalili SM, Hatamlou A (2016) Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl 27(2):495–513

    Google Scholar 

  30. Shukri S, Faris H, Aljarah I, Mirjalili S, Abraham A (2018) Evolutionary static and dynamic clustering algorithms based on multi-verse optimizer. Eng Appl Artif Intell 72:54–66

    Google Scholar 

  31. Kumar P, Garg S, Singh A, Batra S, You Ilsun N (2018) Mvo-based two-dimensional path planning scheme for providing quality of service in uav environment. IEEE Internet Things J 5:1698–1707

    Google Scholar 

  32. Benmessahel I, Xie K, Chellal M (2017) A new evolutionary neural networks based on intrusion detection systems using multiverse optimization. Appl Intell 48:1–13

    Google Scholar 

  33. Janiga D, Czarnota R, Stopa J, Wojnarowski P, Kosowski P (2017) Performance of nature inspired optimization algorithms for polymer enhanced oil recovery process. J Petrol Sci Eng 154:354–366

    Google Scholar 

  34. Ewees AA, El Aziz MA, Hassanien AE (2017) Chaotic multi-verse optimizer-based feature selection. Neural Comput Appl 31:1–16

    Google Scholar 

  35. Faris H, Hassonah MA, Ala’M A-Z, Mirjalili S, Aljarah I (2017) A multi-verse optimizer approach for feature selection and optimizing svm parameters based on a robust system architecture. Neural Comput Appl 30:1–15

    Google Scholar 

  36. Alyasseri ZAA, Khader AT, Al-Betar MA, Abasi AK, Makhadmeh SN (2019) EEG signals denoising using optimal wavelet transform hybridized with efficient metaheuristic methods. IEEE Access 8:10584–10605

    Google Scholar 

  37. Makhadmeh SN, Khader AT, Al-Betar MA, Naim S, Abasi AK, Alyasseri ZAA (2019) Optimization methods for power scheduling problems in smart home: survey. Renew Sustain Energy Rev 115:109362

    Google Scholar 

  38. Shehab M, Khader AT, Laouchedi M, Alomari OA (2019) Hybridizing cuckoo search algorithm with bat algorithm for global numerical optimization. J Supercomput 75(5):2395–2422

    Google Scholar 

  39. MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297

  40. Park H-S, Jun C-H (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341

    Google Scholar 

  41. Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer, Berlin

    MATH  Google Scholar 

  42. Chen C-H (2017) Improved tfidf in big news retrieval: an empirical study. Pattern Recogn Lett 93:113–122

    Google Scholar 

  43. Hussain SF, Haris M (2019) A k-means based co-clustering (KCC) algorithm for sparse, high dimensional data. Expert Syst Appl 118:20–34

    Google Scholar 

  44. Vishwakarma S, Nair PS, Rao DS (2017) A comparative study of K-means and K-medoid clustering for social media text mining. Int J Adv Sci Res Eng Trends 2(11):297–302

    Google Scholar 

  45. Balabantaray RC, Sarma C, Jha M (2015) Document clustering using k-means and k-medoids. arXiv preprint arXiv:1502.07938

  46. Aggarwal CC, Zhai CX (2012) A survey of text clustering algorithms. In: Aggarwal C, Zhai C (eds) Mining text data. Springer, Boston, MA, pp 77–128

    Google Scholar 

  47. Al-Betar MA, Alomari OA, Abu-Romman SM (2019) A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112:114–126

    Google Scholar 

  48. Zeugmann T et al (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston, pp 760–766

    Google Scholar 

  49. Alam S, Dobbie G, Koh YS, Riddle P, Ur Rehman S (2014) Research on particle swarm optimization based clustering: a systematic review of literature and techniques. Swarm Evolut Comput 17:1–13

    Google Scholar 

  50. Esmin AAA, Coelho RA (2013) Consensus clustering based on particle swarm optimization algorithm. In: 2013 IEEE international conference on systems, man, and cybernetics. IEEE, pp 2280–2285

  51. Makhadmeh SN, Khader AT, Al-Betar MA, Naim S, Alyasseri ZAA, Abasi AK (2019) Particle swarm optimization algorithm for power scheduling problem using smart battery. In: 2019 IEEE Jordan international joint conference on electrical engineering and information technology (JEEIT). IEEE, pp 672–677

  52. Song W, Li CH, Park SC (2009) Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Syst Appl 36(5):9095–9104

    Google Scholar 

  53. Akter R, Chung Y (2013) An evolutionary approach for document clustering. IERI Procedia 4:370–375

    Google Scholar 

  54. Karaa WBA, Ashour AS, Sassi DB, Roy P, Kausar N, Dey N (2016) Medline text mining: an enhancement genetic algorithm based approach for document clustering. In: Applications of intelligent optimization in biology and medicine. Springer, pp 267–287

  55. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report, Technical report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department

  56. Karaboga D, Gorkemli B, Ozturk C, Karaboga N (2014) A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif Intell Rev 42(1):21–57

    Google Scholar 

  57. Alia OM, Al-Betar MA, Mandava R, Khader AT (2011) Data clustering using harmony search algorithm. In: International conference on swarm, evolutionary, and memetic computing. Springer, pp 79–88

  58. Yang F, Sun T, Zhang C (2009) An efficient hybrid data clustering method based on k-harmonic means and particle swarm optimization. Expert Syst Appl 36(6):9847–9852

    Google Scholar 

  59. Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11(1):652–657

    Google Scholar 

  60. Wang G-G, Gandomi AH, Alavi AH, Hao G-S (2014) Hybrid krill herd algorithm with differential evolution for global numerical optimization. Neural Comput Appl 25(2):297–308

    Google Scholar 

  61. Wang G-G, Gandomi AH, Yang X-S, Alavi AH (2016) A new hybrid method based on krill herd and cuckoo search for global optimisation tasks. Int J Bio-Inspir Comput 8(5):286–299

    Google Scholar 

  62. Huang A (2008) Similarity measures for text document clustering. In: Proceedings of the sixth New Zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand, pp 49–56

  63. Bansal B, Srivastava S (2018) Hybrid attribute based sentiment classification of online reviews for consumer intelligence. Appl Intell 49:1–13

    Google Scholar 

  64. Cutting DR, Karger DR, Pedersen JO, Tukey JW (2017) Scatter/gather: a cluster-based approach to browsing large document collections. In: ACM SIGIR forum, vol 51. ACM, pp 148–159

  65. Raghuvanshi M, Patel R (2017) An improved document clustering with multiview point similarity/dissimilarity measures. Int J Eng Comput Sci 6(2):20285–20288

    Google Scholar 

  66. Bouras C, Tsogkas V (2012) A clustering technique for news articles using wordnet. Knowl Based Syst 36:115–128

    Google Scholar 

  67. Collective Evolution (2018) New physics theory questions the big bang: how did our universe really begin? Accessed 9 Aug 2018

  68. Smithsonian Institution (2016) Can physicists ever prove the multiverse is real? Accessed 19 Apr 2016

  69. Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18

    Google Scholar 

  70. Subhadra K, Shashi M, Das A (2015) Extended ACO based document clustering with hybrid distance metric. In: 2015 IEEE international conference on electrical, computer and communication technologies (ICECCT). IEEE, pp 1–6

  71. Abualigah LM, Sawaie AM, Khader AT et al (2017) β-Hill climbing technique for the text document clustering. In: Proceedings of the new trends in information technology (NTIT-2017). The University of Jordan, Amman, Jordan, 25–27 April 2017

  72. Sayed GI, Darwish A, Hassanien AE (2018) A new chaotic multi-verse optimization algorithm for solving engineering optimization problems. J Exp Theor Artif Intell 30(2):293–317

    Google Scholar 

  73. Jangir P, Parmar SA, Trivedi IN, Bhesdadiya RH (2017) A novel hybrid particle swarm optimizer with multi verse optimizer for global numerical optimization and optimal reactive power dispatch problem. Eng Sci Technol Int J 20(2):570–586

    Google Scholar 

  74. Sayed GI, Darwish A, Hassanien AE (2019) Quantum multiverse optimization algorithm for optimization problems. Neural Comput Appl 31:2763–2780

    Google Scholar 

  75. OA Arqub (2017) Adaptation of reproducing kernel algorithm for solving fuzzy fredholm-volterra integrodifferential equations. Neural Comput Appl 28(7):1591–1610

    Google Scholar 

  76. Arqub OA, Mohammed AL-S, Momani S, Hayat T (2016) Numerical solutions of fuzzy differential equations using reproducing kernel hilbert space method. Soft Comput 20(8):3283–3302

    MATH  Google Scholar 

  77. Awadallah MA, Al-Betar MA, Bolaji AL, Doush IA, Hammouri AI, Mafarja M (2020) Island artificial bee colony for global optimization. Soft Comput. https://doi.org/10.1007/s00500-020-04760-8

    Article  Google Scholar 

  78. Arqub OA (2019) Application of residual power series method for the solution of time-fractional schrodinger equations in one-dimensional space. Fundam Inform 166(2):87–110

    MathSciNet  MATH  Google Scholar 

  79. Arqub OA, Al-Smadi M, Momani S, Hayat T (2017) Application of reproducing kernel algorithm for solving second-order, two-point fuzzy boundary value problems. Soft Comput 21(23):7191–7206

    MATH  Google Scholar 

  80. Deepa M, Revathy P, Student P (2012) Validation of document clustering based on purity and entropy measures. Int J Adv Res Comput Commun Eng 1(3):147–152

    Google Scholar 

  81. Del Buono N, Pio G (2015) Non-negative matrix tri-factorization for co-clustering: an analysis of the block matrix. Inf Sci 301:13–26

    Google Scholar 

  82. Lin Y-S, Jiang J-Y, Lee S-J (2014) A similarity measure for text classification and clustering. IEEE Trans Knowl Data Eng 26(7):1575–1590

    Google Scholar 

  83. Forsati R, Keikha A, Shamsfard M (2015) An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 159:9–26

    Google Scholar 

  84. Tan SC, Ting KM, Teng SW (2011) A general stochastic clustering method for automatic cluster discovery. Pattern Recogn 44(10–11):2786–2799

    Google Scholar 

  85. Wei T, Yonghe L, Chang H, Zhou Q, Bao X (2015) A semantic approach for text clustering using wordnet and lexical chains. Expert Syst Appl 42(4):2264–2275

    Google Scholar 

  86. Saini N, Saha S, Bhattacharyya P (2019) Automatic scientific document clustering using self-organized multi-objective differential evolution. Cogn Comput 11:271–293

    Google Scholar 

  87. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1:80–83

    Google Scholar 

Download references

Acknowledgements

This work was supported by Universiti Sains Malaysia (USM) under Grant (1001/PKOMP/ 8014016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ammar Kamal Abasi.

Ethics declarations

Conflict of interest

The authors state that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abasi, A.K., Khader, A.T., Al-Betar, M.A. et al. A novel hybrid multi-verse optimizer with K-means for text documents clustering. Neural Comput & Applic 32, 17703–17729 (2020). https://doi.org/10.1007/s00521-020-04945-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04945-0

Keywords

Navigation