Skip to main content
Log in

A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets

  • Data analytics and machine learning
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Identification of informative genes is essential for the disease and cancer studies. Metaheuristic algorithms have been widely used for this purpose. However, their performance on various high-dimensional datasets of genomic studies has not been fully addressed. This work was intended to perform a comprehensive comparative analysis on three well-known nature-inspired metaheuristic algorithms, namely binary particle swarm optimization (BPSO), genetic algorithm (GA) and cuckoo search algorithm (CS) when they are used in gene selection and classification in twelve high-dimensional cancer datasets. The methodology was carried out through the utilization of a three-phase hybrid approach, considering a pre-processing filtration using Pearson product-moment correlation coefficient (PPMCC) followed by the metaheuristic and classification algorithms. Comparably, five different classification algorithms were used in each phase of analysis. It was seen that the application of PCCMA filter has acted upon reducing the computational complexity of overall analysis. The comparative study showed that BPSO outperformed GA and CS in terms of accuracy. However, CS was able to select fewer attributed genes and was less computationally complex compared to that of GA and BPSO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abd El Aziz M, Hassanien AE (2018) Modified cuckoo search algorithm with rough sets for feature selection. Neural Comput Appl 29:925–934

    Article  Google Scholar 

  • Abdel-Basset M, Hessin A-N, Abdel-Fatah L (2018) A comprehensive study of cuckoo-inspired algorithms. Neural Comput Appl 29:345–361

    Article  Google Scholar 

  • Acharya R, Vipsita S, Baliarsingh SK Biclustering of microarray data employing multiobjective ga. In: 2017 14th IEEE India Council International Conference (INDICON), 2017. IEEE, pp 1–6

  • Aghelpour P, Bahrami-Pichaghchi H, Kisi O (2020) Comparison of three different bio-inspired algorithms to improve ability of neuro fuzzy approach in prediction of agricultural drought, based on three different indexes. Comput Electron Agric 170:105279

    Article  Google Scholar 

  • Alba E, Garcia-Nieto J, Jourdan L, Talbi E-G (2007) Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: IEEE Congress on Evolutionary Computation, 2007. CEC 2007. IEEE, pp 284–290. https://doi.org/10.1109/CEC.2007.4424483

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750. https://doi.org/10.1073/pnas.96.12.6745

    Article  Google Scholar 

  • Alshamlan HM, Badr GH, Alohali YA (2015) Genetic Bee Colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60. https://doi.org/10.1016/j.compbiolchem.2015.03.001

    Article  Google Scholar 

  • Ardjani F, Sadouni K, Benyettou M (2010) Optimization of SVM multiclass by particle swarm (PSO-SVM). In: 2010 2nd International Workshop on Database technology and applications (DBTA), IEEE, pp 1–4. https://doi.org/10.1109/DBTA.2010.5658994

  • Aziz R, Verma CK, Jha M, Srivastava N (2017) Artificial neural network classification of microarray data using new hybrid gene selection method. Int J Data Min Bioinform 17:42–65. https://doi.org/10.1504/IJDMB.2017.084026

    Article  Google Scholar 

  • Baliarsingh SK, Ding W, Vipsita S, Bakshi S (2019) A memetic algorithm using emperor penguin and social engineering optimization for medical data classification. Appl Soft Comput 85:105773

    Article  Google Scholar 

  • Baliarsingh SK, Vipsita S, Gandomi AH, Panda A, Bakshi S, Ramasubbareddy S (2020) Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. Comput Methods Programs Biomed 195:105625

    Article  Google Scholar 

  • Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4. https://doi.org/10.1007/978-3-642-00296-0_5

  • Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34:483–519. https://doi.org/10.1007/s10115-012-0487-8

    Article  Google Scholar 

  • Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135. https://doi.org/10.1016/j.ins.2014.05.042

    Article  Google Scholar 

  • Annavarapu CSR, Dara S, Banka H (2016) Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm. EXCLI J 15:460. https://doi.org/10.17179/excli2016-481

    Article  Google Scholar 

  • Chen L-F, Su C-T, Chen K-H, Wang P-C (2012) Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. Neural Comput Appl 21:2087–2096. https://doi.org/10.1007/s00521-011-0632-4

    Article  Google Scholar 

  • Chen Y, Miao D, Wang R (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recogn Lett 31:226–233. https://doi.org/10.1016/j.patrec.2009.10.013

    Article  Google Scholar 

  • Conilione P, Wang D (2005) A comparative study on feature selection for E. coli promoter recognition. Int J Inf Technol 11:54–66

    Google Scholar 

  • Cosma G, Brown D, Archer M, Khan M, Pockley AG (2017) A survey on computational intelligence approaches for predictive modeling in prostate cancer. Expert Syst Appl 70:1–19. https://doi.org/10.1016/j.eswa.2016.11.006

    Article  Google Scholar 

  • Dash R (2018) An adaptive harmony search approach for gene selection and classification of high dimensional medical data journal of king saud university-computer and information sciences

  • Dashtban M, Balafar M, Suravajhala P (2018) Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics 110:10–17. https://doi.org/10.1016/j.ygeno.2017.07.010

    Article  Google Scholar 

  • Dettling M, Bühlmann P (2002) Supervised clustering of genes. Genome Biol. https://doi.org/10.1186/gb-2002-3-12-research0069

    Article  Google Scholar 

  • Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7:3. https://doi.org/10.1186/1471-2105-7-3

    Article  Google Scholar 

  • Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87. https://doi.org/10.1198/016214502753479248

    Article  MathSciNet  MATH  Google Scholar 

  • Elyasigomari V, Lee D, Screen HR, Shaheed MH (2017) Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J Biomed Inform 67:11–20

    Article  Google Scholar 

  • Gandomi AH, Yang X-S, Alavi AH (2013) Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng Comput 29:17–35

    Article  Google Scholar 

  • Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265:993–1004. https://doi.org/10.1016/j.ejor.2017.08.040

    Article  MathSciNet  MATH  Google Scholar 

  • Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3:95–99. https://doi.org/10.1023/A:1022602019183

    Article  Google Scholar 

  • Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537. https://doi.org/10.1126/science.286.5439.531

    Article  Google Scholar 

  • González F, Belanche LA (2013) Feature selection for microarray gene expression data using simulated annealing guided by the multivariate joint entropy arXiv preprint arXiv:13021733. https://doi.org/10.13053/CyS-18-2-2014-032

  • Hall MA (1999) Correlation-based feature selection for machine learning citeseerx. http://doi.org/10.1.1.37.4643

  • Hameed SS, Hassan R, Hassan WH, Muhammadsharif FF, Latiff LA (2021) HDG-select: a novel GUI based application for gene selection and classification in high dimensional datasets. PLoS ONE 16:e0246039

    Article  Google Scholar 

  • Hameed SS, Hassan R, Muhammad FF (2017) Selection and classification of gene expression in autism disorder: use of a combination of statistical filters and a GBPSO-SVM algorithm. PLoS ONE 12:e0187371

    Article  Google Scholar 

  • Hameed SS, Petinrin OO, Osman A (2018) Filter-wrapper combination and embedded feature selection for gene expression data. Int J Advance Soft Compu Appl 10

  • Hassan R, Cohanim B, De Weck O, Venter G (2005) A comparison of particle swarm optimization and the genetic algorithm. In: 46th AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference, 2005. p 1897

  • Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data Advances in bioinformatics 2015 https://doi.org/10.1155/2015/198363

  • Huerta EB, Duval B, Hao J-K (2006) A hybrid GA/SVM approach for gene selection and classification of microarray data. In: Workshops on applications of evolutionary computation, 2006. Springer, pp 34-44

  • Huertas C, Juárez-Ramírez R (2014) Filter feature selection performance comparison in high-dimensional data: a theoretical and empirical analysis of most popular algorithms. In: 2014 17th International Conference on Information Fusion (FUSION), 2014. IEEE, pp 1–8

  • Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038

    Article  Google Scholar 

  • Jamil M, Yang X-S (2013) A literature survey of benchmark functions for global optimisation problems. Int J Math Model Numer Optim 4:150–194

    MATH  Google Scholar 

  • Kar S, Das Sharma K, Maitra M (2015) Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl 42:612–627. https://doi.org/10.1016/j.eswa.2014.08.014

    Article  Google Scholar 

  • Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: 1997 IEEE international conference on systems, man, and cybernetics, 1997. computational cybernetics and simulation, 1997. IEEE, pp 4104–4108

  • Latkowski T, Osowski S (2015) Data mining for feature selection in gene expression autism data. Expert Syst Appl 42:864–872. https://doi.org/10.1016/j.eswa.2014.08.043

    Article  Google Scholar 

  • Lazar C et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf 9:1106–1119. https://doi.org/10.1109/TCBB.2012.33

    Article  Google Scholar 

  • Lim WCE, Kanagaraj G, Ponnambalam S (2014) PCB drill path optimization by combinatorial cuckoo search algorithm The Scientific World Journal 2014

  • Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing

  • Ludwig SA, Picek S, Jakobovic D (2018) Classification of cancer data: analyzing gene expression data using a fuzzy decision tree algorithm. In: Kahraman C, Topcu YI (eds) Operations research applications in health care management. Springer International Publishing, Cham, pp 327–347. https://doi.org/10.1007/978-3-319-65455-3_13

  • Ma J, Bi Z, Ting TO, Hao S, Hao W (2016) Comparative performance on photovoltaic model parameter identification via bio-inspired algorithms. Sol Energy 132:606–616

    Article  Google Scholar 

  • Moraglio A, Di Chio C, Poli R Geometric particle swarm optimisation. In: European conference on genetic programming, 2007. Springer, pp 125–136

  • Mousavirad S, Ebrahimpour-Komleh H (2014) Wrapper feature selection using discrete cuckoo optimization algorithm. Int J Mechatron Electr Comput Eng 4:709–721

    Google Scholar 

  • Ouaarab A, Ahiod B, Yang X-S (2014) Discrete cuckoo search algorithm for the travelling salesman problem. Neural Comput Appl 24:1659–1669

    Article  Google Scholar 

  • Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002

    Article  Google Scholar 

  • Rani MJ, Devaraj D (2019) Two-stage hybrid gene selection using mutual information and genetic algorithm for cancer data classification. J Med Syst 43:235

    Article  Google Scholar 

  • Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517. https://doi.org/10.1093/bioinformatics/btm344

    Article  Google Scholar 

  • Saha S, Biswas S, Acharyya S (2016) Gene selection by sample classification using k nearest neighbor and meta-heuristic algorithms. In: 2016 IEEE 6th international conference on advanced computing (IACC), 2016. IEEE, pp 250–255

  • Santana LEADS, de Paula Canuto AM (2014) Filter-based optimization techniques for selection of feature subsets in ensemble systems. Expert Syst Appl 41:1622–1631. https://doi.org/10.1016/j.eswa.2013.08.059

    Article  Google Scholar 

  • Sharma M, Kaur P (2020) A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem archives of computational methods in engineering:1–25

  • Shehab M, Khader AT, Al-Betar MA (2017) A survey on applications and variants of the cuckoo search algorithm. Appl Soft Comput 61:1041–1059

    Article  Google Scholar 

  • Shukl AK, Pippal SK, Gupta S, Ramachandra Reddy B, Tripathi D (2020) Knowledge discovery in medical and biological datasets by integration of Relief-F and correlation feature selection techniques. J Intell Fuzzy Syst 1–12

  • Shukla AK (2020) Feature selection inspired by human intelligence for improving classification accuracy of cancer types. Comput Intell

  • Shukla AK, Singh P, Vardhan M (2019a) Dna gene expression analysis on diffuse large b-cell lymphoma (dlbcl) based on filter selection method with supervised classification method. In: Computational intelligence in data mining. springer, pp 783–792

  • Shukla AK, Tripathi D, Reddy BR, Chandramohan D (2019b) A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges. Evol Intell 1–21

  • Singh P, Shukla A, Vardhan M (2017) A novel filter approach for efficient selection and small round blue-cell tumor cancer detection using microarray gene expression data. In: 2017 International conference on inventive computing and informatics (ICICI), 2017. IEEE, pp 827–831

  • Singh RK, Sivabalakrishnan M (2015) Feature selection of gene expression data for cancer classification: a review. Procedia Comput Sci 50:52–57. https://doi.org/10.1016/j.procs.2015.04.060

    Article  Google Scholar 

  • Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25:1–14. https://doi.org/10.1109/TKDE.2011.181

    Article  Google Scholar 

  • Stigler SM (1989) Francis Galton's account of the invention of correlation. Stat Sci 73–79. https://projecteuclid.org/euclid.ss/1177012580

  • Sujana TS, Rao NMS, Reddy RS An efficient feature selection using parallel cuckoo search and naïve Bayes classifier. In: 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), 2017. IEEE, pp 167–172

  • Synapse. https://www.synapse.org/

  • Tran B, Xue B, Zhang M (2014) Improved PSO for feature selection on high-dimensional datasets. In: Asia-Pacific conference on simulated evolution and learning, 2014. Springer, pp 503–515. https://doi.org/10.1007/978-3-319-13563-2_43

  • Turabieh H, Mafarja M, Li X (2019) Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Syst Appl 122:27–42

    Article  Google Scholar 

  • Wang L (2012) Feature selection in bioinformatics. In: Independent component analyses, compressive sampling, wavelets, neural net, biosystems, and nanoengineering X, 2012. International Society for Optics and Photonics, p 840113. https://doi.org/10.1117/12.921417

  • Weinstein JN et al (2013) The cancer genome atlas pan-cancer analysis project. Nat Genet 45:1113

    Article  Google Scholar 

  • Weka package. https://www.cs.waikato.ac.nz/ml/weka/

  • Xiong M, Fang X, Zhao J (2001) Biomarker identification by feature wrappers. Genome Res 11:1878–1887. https://doi.org/10.1101/gr.190001

    Article  Google Scholar 

  • Yang X-S (2014) Nature-inspired optimization algorithms. Elsevier

    MATH  Google Scholar 

  • Yang X-S, Deb S (2009) Cuckoo search via Lévy flights. In: 2009 World congress on nature & biologically inspired computing (NaBIC), 2009. IEEE, pp 210–214

  • Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40:3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shilan S. Hameed.

Ethics declarations

Ethical approval

There are no ethical issues that may arise after the publication of this manuscript.

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hameed, S.S., Hassan, W.H., Latiff, L.A. et al. A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft Comput 25, 8683–8701 (2021). https://doi.org/10.1007/s00500-021-05726-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05726-0

Keywords