Abstract
Gene expression data play a significant role in the development of effective cancer diagnosis and prognosis techniques. However, many redundant, noisy, and irrelevant genes (features) are present in the data, which negatively affect the predictive accuracy of diagnosis and increase the computational burden. To overcome these challenges, a new hybrid filter/wrapper gene selection method, called mRMR-BAOAC-SA, is put forward in this article. The suggested method uses Minimum Redundancy Maximum Relevance (mRMR) as a first-stage filter to pick top-ranked genes. Then, Simulated Annealing (SA) and a crossover operator are introduced into Binary Arithmetic Optimization Algorithm (BAOA) to propose a novel hybrid wrapper feature selection method that aims to discover the smallest set of informative genes for classification purposes. BAOAC-SA is an enhanced version of the BAOA in which SA and crossover are used to help the algorithm in escaping local optima and enhancing its global search capabilities. The proposed method was evaluated on 10 well-known microarray datasets, and its results were compared to other current state-of-the-art gene selection methods. The experimental results show that the proposed approach has a better performance compared to the existing methods in terms of classification accuracy and the minimum number of selected genes.










Similar content being viewed by others
References
Chaudhuri A, Sahu TP (2021) A hybrid feature selection method based on binary Jaya algorithm for micro-array data classification. Comput Electr Eng 90:106963. https://doi.org/10.1016/j.compeleceng.2020.106963
Pashaei E, Pashaei E (2021) Gene selection using hybrid dragonfly black hole algorithm: a case study on RNA-seq COVID-19 data. Anal Biochem 627:114242. https://doi.org/10.1016/j.ab.2021.114242
Zhang G, Hou J, Wang J et al (2020) Feature selection for microarray data classification using hybrid information gain and a modified binary krill herd algorithm. Interdiscip Sci Comput Life Sci 12:288–301. https://doi.org/10.1007/s12539-020-00372-w
Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109:91–107. https://doi.org/10.1016/j.ygeno.2017.01.004
Alomari OA, Khader AT, Al-Betar MA, Awadallah MA (2018) A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing. Appl Intell 48:4429–4447. https://doi.org/10.1007/s10489-018-1207-1
Alomari OA, Makhadmeh SN, Al-Betar MA et al (2021) Gene selection for microarray data classification based on gray wolf optimizer enhanced with TRIZ-inspired operators. Knowl Based Syst 223:107034. https://doi.org/10.1016/J.KNOSYS.2021.107034
Dabba A, Tari A, Meftali S, Mokhtari R (2021) Gene selection and classification of microarray data method based on mutual information and moth flame algorithm. Expert Syst Appl 166:114012. https://doi.org/10.1016/J.ESWA.2020.114012
Yan C, Ma J, Luo H, Patel A (2019) Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 184:102–111. https://doi.org/10.1016/j.chemolab.2018.11.010
Abualigah L, Diabat A, Mirjalili S et al (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609. https://doi.org/10.1016/J.CMA.2020.113609
Bansal P, Gehlot K, Singhal A, Gupta A (2022) Automatic detection of osteosarcoma based on integrated features and feature selection using binary arithmetic optimization algorithm. Multimed Tools Appl 81:8807–8834. https://doi.org/10.1007/S11042-022-11949-6/TABLES/6
Agushaka JO, Ezugwu AE (2021) Advanced arithmetic optimization algorithm for solving mechanical engineering design problems. PLoS ONE 16:e0255703. https://doi.org/10.1371/JOURNAL.PONE.0255703
Premkumar M, Jangir P, Kumar BS et al (2021) A new arithmetic optimization algorithm for solving real-world multiobjective CEC-2021 constrained optimization problems: diversity analysis and validations. IEEE Access 9:84263–84295. https://doi.org/10.1109/ACCESS.2021.3085529
Chauhan S, Vashishtha G (2021) Mutation-based arithmetic optimization algorithm for global optimization. In: 2021 Int Conf Intell Technol (CONIT). https://doi.org/10.1109/CONIT51480.2021.9498358
Ewees AA, Al-qaness MAA, Abualigah L et al (2021) Boosting arithmetic optimization algorithm with genetic algorithm operators for feature selection: case study on cox proportional hazards model. Mathematics 9:2321. https://doi.org/10.3390/MATH9182321
Ibrahim RA, Abualigah L, Ewees AA et al (2021) An electric fish-based arithmetic optimization algorithm for feature selection. Entropy 2021 23:1189. https://doi.org/10.3390/E23091189
Abualigah L, Diabat A, Sumari P, Gandomi AH (2021) A novel evolutionary arithmetic optimization algorithm for multilevel thresholding segmentation of COVID-19 CT images. Processes 9:1155. https://doi.org/10.3390/PR9071155
Khatir S, Tiachacht S, Le Thanh C et al (2021) An improved artificial neural network using arithmetic optimization algorithm for damage assessment in FGM composite plates. Compos Struct 273:114287. https://doi.org/10.1016/J.COMPSTRUCT.2021.114287
Mafarja M, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312. https://doi.org/10.1016/j.neucom.2017.04.053
Abdel-Basset M, Ding W, El-Shahat D (2021) A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection. Artif Intell Rev 54:593–637. https://doi.org/10.1007/s10462-020-09860-3
Khamees M, Albakry A, Shaker K (2018) Multi-objective feature selection: hybrid of Salp Swarm and simulated annealing approach. In: Al-mamory SO, Alwan JK, Hussein AD (eds) Al-mamory S, Alwan J, Hussein A (eds) New Trends in Information and Communications Technology Applications. NTICT 2018. Communications in Computer and Information Science. Springer, Cham, pp 129–142
Chantar H, Tubishat M, Essgaer M, Mirjalili S (2021) Hybrid binary dragonfly algorithm with simulated annealing for feature selection. SN Comput Sci 2:1–11. https://doi.org/10.1007/s42979-021-00687-5
Shukla AK, Singh P, Vardhan M (2019) A new hybrid wrapper TLBO and SA with SVM approach for gene expression data. Inf Sci (Ny) 503:238–254. https://doi.org/10.1016/j.ins.2019.06.063
Pandey AC, Rajpoot DS (2019) Feature selection method based on grey wolf optimization and simulated annealing. Recent Adv Comput Sci Commun 14:635–646. https://doi.org/10.2174/2213275912666190408111828
Pashaei E, Pashaei E (2019) Gene selection using intelligent dynamic genetic algorithm and random forest. In: 11th International Conference on Electrical and Electronics Engineering (ELECO), pp 470–474
Paniri M, Dowlatshahi MB, Nezamabadi-pour H (2020) MLACO: A multi-label feature selection algorithm based on ant colony optimization. Knowl Based Syst 192:105285. https://doi.org/10.1016/J.KNOSYS.2019.105285
Tabakhi S, Moradi P (2015) Relevance-redundancy feature selection based on ant colony optimization. Pattern Recognit 48:2798–2811. https://doi.org/10.1016/j.patcog.2015.03.020
Gao L, Ye M, Lu X, Huang D (2017) Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinform 15:389–395. https://doi.org/10.1016/j.gpb.2017.08.002
Al-Betar MA, Alomari OA, Abu-Romman SM (2020) A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 112:114–126. https://doi.org/10.1016/j.ygeno.2019.09.015
Pashaei E, Ozen M, Aydin N (2016) Biomarker discovery based on BBHA and AdaboostM1 on microarray data for cancer classification. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. Institute of Electrical and Electronics Engineers Inc., pp 3080–3083
Dash R (2021) An adaptive harmony search approach for gene selection and classification of high dimensional medical data. J King Saud Univ Comput Inf Sci 33:195–207. https://doi.org/10.1016/j.jksuci.2018.02.013
Shukla AK, Singh P, Vardhan M (2020) An adaptive inertia weight teaching-learning-based optimization algorithm and its applications. Appl Math Model 77:309–326. https://doi.org/10.1016/j.apm.2019.07.046
Bir-Jmel A, Douiri SM, Elbernoussi S (2019) Gene selection via a new hybrid ant colony optimization algorithm for cancer classification in high-dimensional data. Comput Math Methods Med 2019:1–20. https://doi.org/10.1155/2019/7828590
Kundu R, Chattopadhyay S, Cuevas E, Sarkar R (2022) AltWOA: altruistic whale optimization algorithm for feature selection on microarray datasets. Comput Biol Med 144:105349. https://doi.org/10.1016/J.COMPBIOMED.2022.105349
Ghobaei-Arani M (2021) A workload clustering-based resource provisioning mechanism using biogeography based optimization technique in the cloud based systems. Soft Comput 25:3813–3830. https://doi.org/10.1007/S00500-020-05409-2/FIGURES/11
Ghobaei-Arani M, Shahidinejad A (2021) An efficient resource provisioning approach for analyzing cloud workloads: a metaheuristic-based clustering approach. J Supercomput 77:711–750. https://doi.org/10.1007/S11227-020-03296-W/FIGURES/14
Aslanpour MS, Dashti SE, Ghobaei-Arani M, Rahmanian AA (2018) Resource provisioning for cloud applications: a 3-D, provident and flexible approach. J Supercomput 74:6470–6501. https://doi.org/10.1007/S11227-017-2156-X/FIGURES/20
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput J 62:203–215. https://doi.org/10.1016/j.asoc.2017.09.038
Pashaei E, Ozen M, Aydin N (2016) Gene selection and classification approach for microarray data based on random forest ranking and BBHA. In: 3rd IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2016. Institute of Electrical and Electronics Engineers Inc., pp 308–311
Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput J 56:94–106. https://doi.org/10.1016/j.asoc.2017.03.002
Pashaei E, Pashaei E, Aydin N (2019) Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 111:669–686. https://doi.org/10.1016/j.ygeno.2018.04.004
Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl Based Syst 126:8–19. https://doi.org/10.1016/j.knosys.2017.04.004
Shukla AK, Singh P, Vardhan M (2018) A hybrid gene selection method for microarray recognition. Biocybern Biomed Eng 38:975–991. https://doi.org/10.1016/j.bbe.2018.08.004
Wang A, An N, Chen G et al (2015) Accelerating wrapper-based feature selection with K-nearest-neighbor. Knowl Based Syst 83:81–91. https://doi.org/10.1016/j.knosys.2015.03.009
Wang A, An N, Yang J et al (2017) Wrapper-based gene selection with Markov blanket. Comput Biol Med 81:11–23. https://doi.org/10.1016/j.compbiomed.2016.12.002
Lu H, Chen J, Yan K et al (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62. https://doi.org/10.1016/j.neucom.2016.07.080
Tran B, Xue B, Zhang M (2019) Variable-length particle swarm optimization for feature selection on high-dimensional classification. IEEE Trans Evol Comput 23:473–487. https://doi.org/10.1109/TEVC.2018.2869405
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Distributed feature selection: An application to microarray data classification. Appl Soft Comput J 30:136–150. https://doi.org/10.1016/j.asoc.2015.01.035
Zhou Y, Zhang W, Kang J et al (2021) A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf Sci (Ny) 547:841–859. https://doi.org/10.1016/j.ins.2020.08.083
Mollaee M, Moattar MH (2016) A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification. Biocybern Biomed Eng 36:521–529. https://doi.org/10.1016/j.bbe.2016.05.001
Pashaei E, Yilmaz A, Aydin N (2016) A combined SVM and Markov model approach for splice site identification. In: 6th International Conference on Computer and Knowledge Engineering (ICCKE 2016), pp 200–204
Medjahed SA, Saadi TA, Benyettou A, Ouali M (2017) Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput J 51:39–48. https://doi.org/10.1016/j.asoc.2016.12.010
Ahmad Alomari O, Tajudin Khader A, Azmi Al-Betar M, Mohammad Abualigah L (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 19:32–51. https://doi.org/10.1504/IJDMB.2017.088538
Shreem SS, Abdullah S, Nazri MZA (2014) Hybridising harmony search with a Markov blanket for gene selection problems. Inf Sci (Ny) 258:108–121. https://doi.org/10.1016/j.ins.2013.10.012
Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 40:3236–3248. https://doi.org/10.1016/j.patcog.2007.02.007
Apolloni J, Leguizamón G, Alba E (2016) Two-hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput J 38:922–932. https://doi.org/10.1016/j.asoc.2015.10.037
Delahaye D, Chaimatanan S, Mongeau M (2019) Simulated Annealing: From basics to applications. In: Handbook of Metaheuristics. International Series in Operations Research and Management Science. Springer, Cham, pp 1–35
Hameed SS, Hassan WH, Latiff LA, Muhammadsharif FF (2021) A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft Comput 2513(25):8683–8701. https://doi.org/10.1007/S00500-021-05726-0
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput J 62:441–453. https://doi.org/10.1016/j.asoc.2017.11.006
Pashaei E, Pashaei E (2020) Gene selection for cancer classification using a new hybrid of binary black hole algorithm. In: The 28th IEEE Conference on Signal Processing and Communications Applications (SIU2020). Institute of Electrical and Electronics Engineers Inc.
Dabba A, Tari A, Meftali S (2021) Hybridization of moth flame optimization algorithm and quantum computing for gene selection in microarray data. J Ambient Intell Humaniz Comput 12:2731–2750. https://doi.org/10.1007/s12652-020-02434-9
Bommert A, Sun X, Bischl B et al (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:1–19. https://doi.org/10.1016/j.csda.2019.106839
Pashaei E, Ozen M, Aydin N (2016) Random forest in splice site prediction of human genome. In: Kyriacou E, Christofides S, Pattichis C (eds) XIV Mediterranean Conference on Medical and Biological Engineering and Computing. IFMBE Proceedings, vol 57. Springer, Cham, pp 518–523
Pashaei E, Yilmaz A, Ozen M, Aydin N (2016) A novel method for splice sites prediction using sequence component and hidden Markov model. In: Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. Institute of Electrical and Electronics Engineers Inc., pp 3076–3079
Mirjalili S, Lewis A (2013) S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm Evol Comput 9:1–14. https://doi.org/10.1016/j.swevo.2012.09.002
Beheshti Z (2021) UTF: Upgrade transfer function for binary meta-heuristic algorithms. Appl Soft Comput 106:1–28. https://doi.org/10.1016/j.asoc.2021.107346
Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci 99:6562–6566. https://doi.org/10.1073/pnas.102102699
Wenric S, Shemirani R (2018) Using supervised learning methods for gene selection in RNA-Seq case-control studies. Front Genet 9:297. https://doi.org/10.3389/FGENE.2018.00297/BIBTEX
Feng J, Niu X, Zhang J, Wang JH (2022) Gene selection and classification of scRNA-seq data combining information gain ratio and genetic algorithm with dynamic crossover. Wirel Commun Mob Comput 2022:1–16. https://doi.org/10.1155/2022/9639304
Author information
Authors and Affiliations
Contributions
EP and EP designed the model and the computational framework. Both carried out the implementation and performed the experiment and wrote the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pashaei, E., Pashaei, E. Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data. J Supercomput 78, 15598–15637 (2022). https://doi.org/10.1007/s11227-022-04507-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04507-2