Skip to main content
Log in

A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper proposed a new gene selection method based on modified Minimum Redundancy Maximum Relevancy (MRMR) as a filtering approach and hybrid bat algorithm with β-hill climbing as an efficient wrapper approach. The gene selection is a process of selecting the discriminative genes that aid in the development of efficient cancer diagnosis and classification. In general, the current filter-based approaches produced gene subset according to its discriminative power. However, one of the deficiencies of single filter approaches is that it has high variability of the classification results. Accordingly, this study aim to improve MRMR through incorporating its with ensemble of filters to increase the robustness and the stability of MRMR. The result of filtering-based approach is a set of discriminative genes. The wrapper-based approach considers the results from the filtering-based approach to formulate the gene selection search space. In wrapper approach, bat algorithm is tailored for gene selection problem and hybridized with a powerful local search method called beta hill climbing to further stress the deep learning side in the search space navigation and thus find a very robust and stable discriminative genes. Bat-inspired algorithm (BA) is a recent swarm-based optimization method while β-hill climbing is an exploratory local search. The proposed method is called Robust MRMR and Hybrid Bat-inspired Algorithm (rMRMR-HBA). To evaluate the proposed method, ten well-known microarray datasets are experimented with. These datasets are varies in terms of number of genes, samples, and classes. For performance evaluation, the proposed filtering-based approach (i.e., rMRMR) is initially tested against the standard MRMR and other well-regard filtering approaches. Thereafter, the wrapper-based approach (i.e., HBA) is evaluated by studying the convergence behavior of BA with and without β-hill climbing. For comparative evaluation, the results of the proposed rMRMR-HBA were compared with state-of-art methods using the same microarray datasets. The comparative results show that our proposed approach achieved outstanding results in two out of ten datasets in terms of clarification accuracy and minimum number of genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://sdmc.lit.org.sg/GEDatasets/Datasets

References

  1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  2. El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowl Inf Syst 26(3):487–500

    Article  Google Scholar 

  3. Lai C-M, Yeh W-C, Chang C-Y Gene selection using information gain and improved simplified swarm optimization, Neurocomputing

  4. Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158

    Article  Google Scholar 

  5. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    Article  Google Scholar 

  6. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinforma Comput Biol 3(02):185–205

    Article  Google Scholar 

  7. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: European conference on machine learning. Springer, pp 171–182

  8. Su C -T, Hsu J -H (2005) An extended chi2 algorithm for discretization of real value attributes. IEEE Trans Knowl Data Eng 17(3):437–441

    Article  Google Scholar 

  9. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  Google Scholar 

  10. Li B-Q, Hu L-L, Chen L, Feng K-Y, Cai Y-D, Chou K-C (2012) Prediction of protein domain with mrmr feature selection and analysis. PLoS One 7(6):e39308

    Article  Google Scholar 

  11. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  12. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 45(1):531–539

    Article  Google Scholar 

  13. Seijo-Pardo B, Porto-díaz I, Bolón-canedo V, Alonso-betanzos A (2017) Ensemble feature selection: Homogeneous and heterogeneous approaches. Knowl-Based Syst 118:124–139

    Article  Google Scholar 

  14. Ebrahimpour MK, Eftekhari M (2017) Ensemble of feature selection methods: a hesitant fuzzy sets approach. Appl Soft Comput 50:300–312

    Article  Google Scholar 

  15. Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922

    Article  Google Scholar 

  16. Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60

    Article  Google Scholar 

  17. Shreem SS, Abdullah S, Nazri MZA (2014) Hybridising harmony search with a markov blanket for gene selection problems. Inf Sci 258:108–121

    Article  MathSciNet  Google Scholar 

  18. Chuang L-Y, Yang C-H, Li J-C, Yang C-H (2012) A hybrid bpso-cga approach for gene selection and classification of microarray data. J Comput Biol 19(1):68–82

    Article  MathSciNet  Google Scholar 

  19. Li S, Wu X, Tan M (2008) Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput-A Fusion Found Methodol Appl 12(11):1039–1048

    Google Scholar 

  20. Yang X-S (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer, pp 65–74

  21. Ramesh B, Mohan VCJ, Reddy VV (2013) Application of bat algorithm for combined economic load and emission dispatch. Int J Electric Eng Telecommun 2(1):1–9

    Google Scholar 

  22. Musikapun P, Pongcharoen P (2012) Solving multi-stage multi-machine multi-product scheduling problem using bat algorithm. In: 2nd international conference on management and artificial intelligence, vol 35. IACSIT Press Singapore, pp 98–102

  23. Yang X-S, Hossein Gandomi A (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483

    Article  Google Scholar 

  24. Lin J-H, Chou C-W, Yang C-H, Tsai H-L et al (2012) A chaotic levy flight bat algorithm for parameter estimation in nonlinear dynamic biological systems. Source: J Comput Inf Technol 2(2):56–63

    Google Scholar 

  25. Mishra S, Shaw K, Mishra D (2012) A new meta-heuristic bat inspired classification approach for microarray data. Procedia Technol 4:802–806

    Article  Google Scholar 

  26. Alomari OA, Khader AT, Al-Betar MA, Abualigah LM (2017) Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Mining Bioinforma 19(1):32–51

    Article  Google Scholar 

  27. Komarasamy G, Wahi A (2012) An optimized k-means clustering technique using bat algorithm. Eur J Sci Res 84(2):26–273

    Google Scholar 

  28. Al-Betar MA, Awadallah MA, Faris H, Yang X-S, Khader AT, Alomari OA (2018) Bat-inspired algorithms with natural selection mechanisms for global optimization. Neurocomputing 273:448–465

    Article  Google Scholar 

  29. Akhtar S, Ahmad A, Abdel-Rahman E (2012) A metaheuristic bat-inspired algorithm for full body human pose estimation. In: 2012 Ninth Conference on Computer and robot vision (CRV). IEEE, pp 369–375

  30. Tosun Ö, Marichelvam M (2016) Hybrid bat algorithm for flow shop scheduling problems. Int J Math Oper Res 9(1):125–138

    Article  MathSciNet  Google Scholar 

  31. Roeva ON, Fidanova SS (2013) Hybrid bat algorithm for parameter identification of an e. coli cultivation process model. Biotechnol Biotechnol Equip 27(6):4323–4326

    Article  Google Scholar 

  32. Afrabandpey H, Ghaffari M, Mirzaei A, Safayani M (2014) A novel bat algorithm based on chaos for optimization tasks. In: 2014 Iranian Conference on Intelligent systems (ICIS). IEEE, pp 1–6

  33. Wang G, Guo L, Duan H, Liu L, Wang H (2012) A bat algorithm with mutation for ucav path planning. The Scientific World Journal

  34. Kira K, Rendell LA The feature selection problem: Traditional methods and a new algorithm. In: AAAI, Vol 2, 1992, pp 129–134

  35. Jović A, Brkić K, Bogunović N A review of feature selection methods with applications. In: 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), IEEE, 2015, pp 1200–1205

  36. Nakamura RY, Pereira LA, Costa K, Rodrigues D, Papa JP, Yang X-S (2012) Bba: a binary bat algorithm for feature selection. In: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, pp 291–297

  37. Davis L Bit-climbing, representational bias, and test suite design. In: ICGA, 1991, pp 18–23

  38. Ingber L, Rosen B (1992) Genetic algorithms and very fast simulated reannealing: A comparison. Math Comput Modell 16(11):87–100

    Article  MathSciNet  Google Scholar 

  39. Dueck G (1993) New optimization heuristics: The great deluge algorithm and the record-to-record travel. J Comput Phys 104(1):86–92

    Article  MathSciNet  Google Scholar 

  40. Al-betar MA (2017) β-hill climbing: an exploratory local search. Neural Comput Appl 28(1):153–168

    Article  Google Scholar 

  41. Alyasseri ZAA, Khader AT, Al-Betar MA, Awadallah MA (2018) Hybridizing β-hill climbing with wavelet transform for denoising ecg signals. Inf Sci 429:229–246

    Article  MathSciNet  Google Scholar 

  42. Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, NJ

    Google Scholar 

  43. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999

    Article  Google Scholar 

  44. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159

    Article  Google Scholar 

  45. Ṡen B, Peker M, Ċavuṡoġlu A, Ċelebi FV (2014) A comparative study on classification of sleep stage based on eeg signals using feature selection and classification algorithms. J Med Syst 38(3):1–21

    Article  Google Scholar 

  46. Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15

  47. Duval B, Hao J-K, Hernandez Hernandez JC (2009) A memetic algorithm for gene selection and molecular classification of cancer. In: Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM, pp 201–208

  48. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156

    Article  Google Scholar 

  49. Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1):237–260

    Article  MathSciNet  Google Scholar 

  50. Li X, Yin M (2013) Multiobjective binary biogeography based optimization for feature selection using gene expression data. IEEE Trans NanoBiosci 12(4):343–353

    Article  Google Scholar 

  51. Ahmad Alomari O, Tajudin Khader A, Azmi Al-Betar M, Mohammad Abualigah L Mrmr ba: A hybrid gene selection algorithm for cancer classification. J Theor Appl Inform Technol 95(12):2610–2618

  52. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newslett 11(1):10–18

    Article  Google Scholar 

  53. Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27

    Google Scholar 

  54. Huang C-L, Wang C-J (2006) A ga-based feature selection and parameters optimizationfor support vector machines. Expert Syst Appl 31(2):231–240

    Article  Google Scholar 

  55. Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248

    Article  Google Scholar 

  56. Jain I, Jain VK, Jain R Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification, Applied Soft Computing

  57. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135

    Article  Google Scholar 

  58. Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437

    Article  Google Scholar 

  59. Li J, Liu H Kent ridge bio-medical data set repository, Institute for Infocomm Research. http://sdmc.lit.org.sg/GEDatasets/Datasets.html

  60. Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134

    Article  Google Scholar 

  61. Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl-Based Syst 23(6):580–585

    Article  Google Scholar 

  62. Talbi E-G, Jourdan L, Garcia-Nieto J, Alba E (2008) Comparison of population based metaheuristics for feature selection: Application to microarray data classification. In: 2008 IEEE/ACS International Conference on Computer Systems and Applications. IEEE, pp 45–52

  63. Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn 39(12):2383–2392

    Article  Google Scholar 

  64. Bonilla-Huerta E, Duval B, Hernández JCH, Hao J-K, Morales-Caporal R (2011) Hybrid filter-wrapper with a specialized random multi-parent crossover operator for gene selection and classification problems. In: International Conference on Intelligent Computing. Springer, pp 453–461

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osama Ahmad Alomari.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alomari, O.A., Khader, A.T., Al-Betar, M.A. et al. A novel gene selection method using modified MRMR and hybrid bat-inspired algorithm with β-hill climbing. Appl Intell 48, 4429–4447 (2018). https://doi.org/10.1007/s10489-018-1207-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1207-1

Keywords

Navigation