Abstract
This paper proposed a new gene selection method based on modified Minimum Redundancy Maximum Relevancy (MRMR) as a filtering approach and hybrid bat algorithm with β-hill climbing as an efficient wrapper approach. The gene selection is a process of selecting the discriminative genes that aid in the development of efficient cancer diagnosis and classification. In general, the current filter-based approaches produced gene subset according to its discriminative power. However, one of the deficiencies of single filter approaches is that it has high variability of the classification results. Accordingly, this study aim to improve MRMR through incorporating its with ensemble of filters to increase the robustness and the stability of MRMR. The result of filtering-based approach is a set of discriminative genes. The wrapper-based approach considers the results from the filtering-based approach to formulate the gene selection search space. In wrapper approach, bat algorithm is tailored for gene selection problem and hybridized with a powerful local search method called beta hill climbing to further stress the deep learning side in the search space navigation and thus find a very robust and stable discriminative genes. Bat-inspired algorithm (BA) is a recent swarm-based optimization method while β-hill climbing is an exploratory local search. The proposed method is called Robust MRMR and Hybrid Bat-inspired Algorithm (rMRMR-HBA). To evaluate the proposed method, ten well-known microarray datasets are experimented with. These datasets are varies in terms of number of genes, samples, and classes. For performance evaluation, the proposed filtering-based approach (i.e., rMRMR) is initially tested against the standard MRMR and other well-regard filtering approaches. Thereafter, the wrapper-based approach (i.e., HBA) is evaluated by studying the convergence behavior of BA with and without β-hill climbing. For comparative evaluation, the results of the proposed rMRMR-HBA were compared with state-of-art methods using the same microarray datasets. The comparative results show that our proposed approach achieved outstanding results in two out of ten datasets in terms of clarification accuracy and minimum number of genes.
Similar content being viewed by others
References
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowl Inf Syst 26(3):487–500
Lai C-M, Yeh W-C, Chang C-Y Gene selection using information gain and improved simplified swarm optimization, Neurocomputing
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinforma Comput Biol 3(02):185–205
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: European conference on machine learning. Springer, pp 171–182
Su C -T, Hsu J -H (2005) An extended chi2 algorithm for discretization of real value attributes. IEEE Trans Knowl Data Eng 17(3):437–441
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Li B-Q, Hu L-L, Chen L, Feng K-Y, Cai Y-D, Chou K-C (2012) Prediction of protein domain with mrmr feature selection and analysis. PLoS One 7(6):e39308
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 45(1):531–539
Seijo-Pardo B, Porto-díaz I, Bolón-canedo V, Alonso-betanzos A (2017) Ensemble feature selection: Homogeneous and heterogeneous approaches. Knowl-Based Syst 118:124–139
Ebrahimpour MK, Eftekhari M (2017) Ensemble of feature selection methods: a hesitant fuzzy sets approach. Appl Soft Comput 50:300–312
Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922
Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60
Shreem SS, Abdullah S, Nazri MZA (2014) Hybridising harmony search with a markov blanket for gene selection problems. Inf Sci 258:108–121
Chuang L-Y, Yang C-H, Li J-C, Yang C-H (2012) A hybrid bpso-cga approach for gene selection and classification of microarray data. J Comput Biol 19(1):68–82
Li S, Wu X, Tan M (2008) Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput-A Fusion Found Methodol Appl 12(11):1039–1048
Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer, pp 65–74
Ramesh B, Mohan VCJ, Reddy VV (2013) Application of bat algorithm for combined economic load and emission dispatch. Int J Electric Eng Telecommun 2(1):1–9
Musikapun P, Pongcharoen P (2012) Solving multi-stage multi-machine multi-product scheduling problem using bat algorithm. In: 2nd international conference on management and artificial intelligence, vol 35. IACSIT Press Singapore, pp 98–102
Yang X-S, Hossein Gandomi A (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483
Lin J-H, Chou C-W, Yang C-H, Tsai H-L et al (2012) A chaotic levy flight bat algorithm for parameter estimation in nonlinear dynamic biological systems. Source: J Comput Inf Technol 2(2):56–63
Mishra S, Shaw K, Mishra D (2012) A new meta-heuristic bat inspired classification approach for microarray data. Procedia Technol 4:802–806
Alomari OA, Khader AT, Al-Betar MA, Abualigah LM (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Mining Bioinforma 19(1):32–51
Komarasamy G, Wahi A (2012) An optimized k-means clustering technique using bat algorithm. Eur J Sci Res 84(2):26–273
Al-Betar MA, Awadallah MA, Faris H, Yang X-S, Khader AT, Alomari OA (2018) Bat-inspired algorithms with natural selection mechanisms for global optimization. Neurocomputing 273:448–465
Akhtar S, Ahmad A, Abdel-Rahman E (2012) A metaheuristic bat-inspired algorithm for full body human pose estimation. In: 2012 Ninth Conference on Computer and robot vision (CRV). IEEE, pp 369–375
Tosun Ö, Marichelvam M (2016) Hybrid bat algorithm for flow shop scheduling problems. Int J Math Oper Res 9(1):125–138
Roeva ON, Fidanova SS (2013) Hybrid bat algorithm for parameter identification of an e. coli cultivation process model. Biotechnol Biotechnol Equip 27(6):4323–4326
Afrabandpey H, Ghaffari M, Mirzaei A, Safayani M (2014) A novel bat algorithm based on chaos for optimization tasks. In: 2014 Iranian Conference on Intelligent systems (ICIS). IEEE, pp 1–6
Wang G, Guo L, Duan H, Liu L, Wang H (2012) A bat algorithm with mutation for ucav path planning. The Scientific World Journal
Kira K, Rendell LA The feature selection problem: Traditional methods and a new algorithm. In: AAAI, Vol 2, 1992, pp 129–134
Jović A, Brkić K, Bogunović N A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), IEEE, 2015, pp 1200–1205
Nakamura RY, Pereira LA, Costa K, Rodrigues D, Papa JP, Yang X-S (2012) Bba: a binary bat algorithm for feature selection. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, pp 291–297
Davis L Bit-climbing, representational bias, and test suite design. In: ICGA, 1991, pp 18–23
Ingber L, Rosen B (1992) Genetic algorithms and very fast simulated reannealing: A comparison. Math Comput Modell 16(11):87–100
Dueck G (1993) New optimization heuristics: The great deluge algorithm and the record-to-record travel. J Comput Phys 104(1):86–92
Al-betar MA (2017) β-hill climbing: an exploratory local search. Neural Comput Appl 28(1):153–168
Alyasseri ZAA, Khader AT, Al-Betar MA, Awadallah MA (2018) Hybridizing β-hill climbing with wavelet transform for denoising ecg signals. Inf Sci 429:229–246
Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, NJ
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Ṡen B, Peker M, Ċavuṡoġlu A, Ċelebi FV (2014) A comparative study on classification of sleep stage based on eeg signals using feature selection and classification algorithms. J Med Syst 38(3):1–21
Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15
Duval B, Hao J-K, Hernandez Hernandez JC (2009) A memetic algorithm for gene selection and molecular classification of cancer. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM, pp 201–208
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156
Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1):237–260
Li X, Yin M (2013) Multiobjective binary biogeography based optimization for feature selection using gene expression data. IEEE Trans NanoBiosci 12(4):343–353
Ahmad Alomari O, Tajudin Khader A, Azmi Al-Betar M, Mohammad Abualigah L Mrmr ba: A hybrid gene selection algorithm for cancer classification. J Theor Appl Inform Technol 95(12):2610–2618
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Huang C-L, Wang C-J (2006) A ga-based feature selection and parameters optimizationfor support vector machines. Expert Syst Appl 31(2):231–240
Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248
Jain I, Jain VK, Jain R Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification, Applied Soft Computing
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437
Li J, Liu H Kent ridge bio-medical data set repository, Institute for Infocomm Research. http://sdmc.lit.org.sg/GEDatasets/Datasets.html
Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134
Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl-Based Syst 23(6):580–585
Talbi E-G, Jourdan L, Garcia-Nieto J, Alba E (2008) Comparison of population based metaheuristics for feature selection: Application to microarray data classification. In: 2008 IEEE/ACS International Conference on Computer Systems and Applications. IEEE, pp 45–52
Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39(12):2383–2392
Bonilla-Huerta E, Duval B, Hernández JCH, Hao J-K, Morales-Caporal R (2011) Hybrid filter-wrapper with a specialized random multi-parent crossover operator for gene selection and classification problems. In: International Conference on Intelligent Computing. Springer, pp 453–461
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alomari, O.A., Khader, A.T., Al-Betar, M.A. et al. A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing. Appl Intell 48, 4429–4447 (2018). https://doi.org/10.1007/s10489-018-1207-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1207-1