Skip to main content
Log in

A parallel hybrid krill herd algorithm for feature selection

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In this paper, a novel feature selection method is introduced to tackle the problem of high-dimensional features in the text clustering application. Text clustering is a prevailing direction in big text mining; in this manner, documents are grouped into cohesive groups by using neatly selected informative features. Swarm-based optimization techniques have been widely used to select the relevant text features and shown promising results on multi-sized datasets. The performance of traditional optimization algorithms tends to fail miserably when using large-scale datasets. A novel parallel membrane-inspired framework is proposed to enhance the performance of the krill herd algorithm combined with the swap mutation strategy (MHKHA). In which the krill herd algorithm is hybridized the swap mutation strategy and incorporated within the parallel membrane framework. Finally, the k-means technique is employed based on the results of feature selection-based Krill Herd Algorithm to cluster the documents. Seven benchmark datasets of various characterizations are used. The results revealed that the proposed MHKHA produced superior results compared to other optimization methods. This paper presents an alternative method for the text mining community through cohesive and informative features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Al-Sai ZA, Abualigah LM (2017) Big data and e-government: A review. In: Information Technology (ICIT), 2017 8th International Conference on, IEEE, pp 580–587

  2. Peng H, Wang C, Guan X (2010) Swarm intelligent optimization algorithm for text clustering. In: 2010 3rd International Conference on Computer Science and Information Technology, volume 5, IEEE, pp 200–203

  3. Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing 73:4773–4795

    Article  Google Scholar 

  4. Janani R, Vijayarani S (2019) Text document clustering using spectral clustering algorithm with particle swarm optimization. Expert Systems with Applications 134:192–200

    Article  Google Scholar 

  5. Sayed GI, Hassanien AE, Azar AT (2019) Feature selection via a novel chaotic crow search algorithm. Neural Computing and Applications 31:171–188

    Article  Google Scholar 

  6. Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications 5:19

    Article  Google Scholar 

  7. Zhang Y, Li H-G, Wang Q, Peng C (2019) A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection. Applied Intelligence 1–10

  8. Tubishat M, Abushariah MA, Idris N, Aljarah I (2019) Improved whale optimization algorithm for feature selection in arabic sentiment analysis. Applied Intelligence 49:1688–1707

    Article  Google Scholar 

  9. Abualigah LM, Khader AT (2016) AI-Betar MA, Unsupervised feature selection technique based on harmony search. In: 2016 7th international conference on computer science and information technology (CSIT), IEEE

  10. Hazir E, Erdinler ES, Koc KH (2018) Optimization of cnc cutting parameters using design of experiment (doe) and desirability function. Journal of forestry research 29:1423–1434

    Article  Google Scholar 

  11. Luo M, Nie F, Chang X, Yang Y, Hauptmann AG, Zheng Q (2018) Adaptive unsupervised feature selection with structure regularization. IEEE transactions on neural networks and learning systems 29:944–956

    Article  Google Scholar 

  12. Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes. Expert Systems with Applications 38:5197–5204

    Article  Google Scholar 

  13. Wang C, Lin Y, Liu J (2019) Feature selection for multi-label learning with missing labels. Applied Intelligence 1–16

  14. Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2018) Pareto front feature selection based on artificial bee colony optimization. Information Sciences 422:462–479

    Article  Google Scholar 

  15. Abualigah LM, Khader AT, Hanandeh ES (2018) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. Journal of Computational Science 25:456–466

    Article  Google Scholar 

  16. Amini S, Homayouni S, Safari A, Darvishsefat AA (2018) Object-based classification of hyperspectral data using random forest algorithm. Geo-spatial Information Science 21:127–138

    Article  Google Scholar 

  17. Abualigah L (2020) Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Computing and Applications 1–21

  18. Bolaji AL, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM (2016) A comprehensive review: Krill herd algorithm (kh) and its applications. Applied Soft Computing 49:437–446

    Article  Google Scholar 

  19. Xu X, Liu Y (2017) Recent advances in intelligent robotic systems. CAAI Transactions on Intelligence Technology 2:141–141

    Article  Google Scholar 

  20. Abualigah LMQ (2019) Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering, Studies in Computational Intelligence book series, Springer

  21. Abualigah LM, Khader AT, Al-Betar MA, Alyasseri ZAA, Alomari OA, Hanandeh ES (2017) Feature selection with \(\beta\)-hill climbing search for text clustering application. In: Information and Communication Technology (PICICT), 2017 Palestinian International Conference on, IEEE, pp 22–27

  22. Bharti KK, Singh PK (2014) A three-stage unsupervised dimension reduction method for text clustering. Journal of Computational Science 5:156–169

    Article  Google Scholar 

  23. Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA (2016) A krill herd algorithm for efficient text documents clustering. In: Computer Applications & Industrial Electronics (ISCAIE), 2016 IEEE Symposium on, IEEE, pp 67–72

  24. Bharti KK, Singh P (2014) Chaotic artificial bee colony for text clustering. In: 2014 Fourth International Conference of Emerging Applications of Information Technology, IEEE, 2014, pp 337–343

  25. Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight bpso for feature selection in text clustering. Applied Soft Computing

  26. Kushwaha N, Pant M (2018) Link based bpso for feature selection in big data text clustering. Future Generation Computer Systems 82:190–199

    Article  Google Scholar 

  27. Abualigah LM, Khader AT, Hanandeh ES (2018) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative Computing, Optimization and Its Applications, Springer, pp 305–320

  28. Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Systems with Applications 84:24–36

    Article  Google Scholar 

  29. Morrison RW, Jong KAD (2002) Measurement of population diversity. In: Selected Papers from the 5th European Conference on Artificial Evolution, Springer-Verlag, London, UK, UK, 2002, pp. 31–41. http://dl.acm.org/citation.cfm?id=647456.727749

  30. Chen G, Lu Z, Zhang Z (2018) Improved krill herd algorithm with novel constraint handling method for solving optimal power flow problems. Energies 11:76

    Article  Google Scholar 

  31. Babaoglu İ, Findik O, Ülker E (2010) A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine. Expert Systems with Applications 37:3177–3183

    Article  Google Scholar 

  32. Gandomi AH, Alavi AH (2012) Krill herd: a new bio-inspired optimization algorithm. Communications in Nonlinear Science and Numerical Simulation 17:4831–4845

    Article  MathSciNet  Google Scholar 

  33. Abdel-Basset M, Manogaran G, El-Shahat D, Mirjalili S (2018) A hybrid whale optimization algorithm based on local search strategy for the permutation flow shop scheduling problem. Future Generation Computer Systems 85:129–145

    Article  Google Scholar 

  34. Abualigah LM, Khader AT, Hanandeh ES (2019) Modified krill herd algorithm for global numerical optimization problems. In: Advances in Nature-Inspired Computing and Applications, Springer, pp 205–221

  35. Abualigah LM, Khader AT, Hanandeh ES (2018) A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Engineering Applications of Artificial Intelligence 73:111–125

    Article  Google Scholar 

  36. Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering, in: Computer Science and Information Technology (CSIT), 2016 7th International Conference on, IEEE, pp 1–6

  37. Tu Q, Chen X, Liu X (2019) Multi-strategy ensemble grey wolf optimizer and its application to feature selection. Applied Soft Computing 76:16–30

    Article  Google Scholar 

  38. Abualigah L, Diabat A (2020) A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments. Cluster Computing 1–19

  39. Forsati R, Keikha A, Shamsfard M (2015) An improved bee colony optimization algorithm with an application to document clustering. Neurocomputing 159:9–26

    Article  Google Scholar 

  40. Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications 42:3105–3114

    Article  Google Scholar 

  41. Bharti KK, Singh PK (2016) Chaotic gradient artificial bee colony for text clustering. Soft Computing 20:1113–1126

    Article  Google Scholar 

  42. Rose JD (2016) An efficient association rule based hierarchical algorithm for text clustering, Int J Adv Engg Tech/Vol. VII/Issue I/Jan.-March 751 (2016) 753

  43. Abualigah LM, Sawaie AM, Khader AT, Rashaideh H, Al-Betar MA, Shehab M (2017a) \(\beta\)-hill climbing technique for the text document clustering. New Trends in Information Technology 60

  44. Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017b) Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. In: Eai International Conference on Computer Science and Engineering

  45. Kushwaha N, Pant M (2017) Link based bpso for feature selection in big data text clustering. Future Generation Computer Systems

  46. Mirhosseini M (2017) A clustering approach using a combination of gravitational search algorithm and k-harmonic means and its application in text document clustering. Turkish Journal of Electrical Engineering & Computer Sciences 25:1251–1262

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Abualigah.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abualigah, L., Alsalibi, B., Shehab, M. et al. A parallel hybrid krill herd algorithm for feature selection. Int. J. Mach. Learn. & Cyber. 12, 783–806 (2021). https://doi.org/10.1007/s13042-020-01202-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01202-7

Keywords

Navigation