Skip to main content
Log in

A hybrid feature selection algorithm for microarray data

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

For each microarray data set, only a small number of genes are beneficial. Due to the high-dimensional problem, gene selection research work remains a challenge. In order to solve the high-dimensional problem, we propose a dimensionality reduction algorithm named K value maximum relevance minimum redundancy improved grey wolf optimizer (KMR2IGWO). First, in the processing of KMR2, the K genes are selected. Second, the K genes are initialized by two ways according to random selection feature and different proportions of selection feature. Finally, the IGWO algorithm selects the optimal classification accuracy and the optimal combination of gene by adjusting the parameters of fitness function. The algorithm has a significant dimensionality reduction effect and is suitable for high-dimensional data sets. Experimental results show that the proposing KMR2IGWO strategy significantly reduces the dimension of microarray data and removes the redundant features. On the 14 microarray data sets, compared with the four algorithms mRMR + PSO, mRMR + GA, mRMR + BA, mRMR + CS, the proposed algorithm has higher performance in classification accuracy and feature subset length. In five data sets, the proposed algorithm average classification accuracy is 100%. On the 14 data sets, the proposed algorithm has a very significant dimensionality reduction effect, and the dimensionality reduction range is between 0.4% and 0.04%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Cheng Qiang (2010) A sparse learning machine for high-dimensional data with application to microarray gene analysis. IEEE/ACM Trans Comput Biol Bioinf 7(4):636–646

    Article  Google Scholar 

  2. Heisig J et al (2013) Target gene analysis by microarrays and chromatin immunoprecipitation identifies HEY proteins as highly redundant bHLH repressors. PLoS Genet 8(5):e1002728

    Article  Google Scholar 

  3. Armanfard N, Reilly JP, Komeili M (2016) Local feature selection for data classification. IEEE Trans Pattern Anal Mach Intell 38(6):1217–1227

    Article  Google Scholar 

  4. Wang D, Nie F, Huang H (2015) Feature selection via global redundancy minimization. IEEE Trans Knowl Data Eng 27(10):2743–2755

    Article  Google Scholar 

  5. Sebban M, Nock R (2002) A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recogn 35(4):835–846

    Article  Google Scholar 

  6. Freeman C, Kulic D, Basir O (2015) An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recogn 48(5):1812–1826

    Article  Google Scholar 

  7. Zhao X, Li D, Yang B et al (2015) A two-stage feature selection method with its application. Comput Electr Eng 47:114–125

    Article  Google Scholar 

  8. Pourpanah F, Lim CP, Saleh JM (2016) A hybrid model of fuzzy ARTMAP and genetic algorithm for data classification and rule extraction. Expert Syst Appl 49:74–85

    Article  Google Scholar 

  9. Akadi AE, Amine A, Ouardighi AE, Aboutajdine D (2011) A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst 26(3):487–500

    Article  Google Scholar 

  10. Alshamlan H, Badr G, Alohali Y (2015) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. Biomed Res Int 2015(4):1–15

    Article  Google Scholar 

  11. Che J, Yang Y, Li L, Bai X, Zhang H, Dheng C (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data. Inform Sci 407:68–86

    Article  Google Scholar 

  12. Lu H, Chen J, Yan K, Qun J, Yu X, Zhigang G (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256(C):56–62

    Article  Google Scholar 

  13. García V, Sánchez JS (2015) Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inform Sci 294:362–375

    Article  MathSciNet  Google Scholar 

  14. Elyasigomari V, Lee DA, Screen HR, Shaheen MH (2017) Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J Biomed Inform 67:11–20

    Article  Google Scholar 

  15. Yang P, Ho JWK, Yang YH, Zhou BB (2011) Gene–gene interaction filtering with ensemble of filters. BMC Bioinform 12:2901–2917

    Google Scholar 

  16. Mundra PA, Rajapakse JC (2010) SVM-RFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 9(1):31–37

    Article  Google Scholar 

  17. Zhang Y, Ding C, Li T (2008) Gene selection algorithm by combining reliefF and mRMR. BMC Genom 9(Suppl 2):S27

    Article  Google Scholar 

  18. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  19. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69(3):46–61

    Article  Google Scholar 

  20. Wong LI, Sulaiman MH, Mohamed MR, Hong MS (2014) Grey wolf optimizer for solving economic dispatch problems. Paper Presented at 2014 IEEE International Conference on Power and Energy (PECon), Kuching, Malaysia

  21. Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172(C):371–381

    Article  Google Scholar 

  22. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  23. Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  24. Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification. Bioinformatics 20(3):374–380

    Article  Google Scholar 

  25. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia

  26. Vieira SM, Mendonça LF, Farinha GJ, Sousa JMC (2013) Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 13(8):3494–3504

    Article  Google Scholar 

  27. Tsai C-F, Eberle W, Chu C-Y (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247

    Article  Google Scholar 

  28. Wang Z, Shao Y-H, Wu T-R (2013) A GA-based model selection for smooth twin parametric-margin support vector machine. Pattern Recogn 46(8):2267–2277

    Article  Google Scholar 

  29. Yang XS, He X (2013) Bat algorithm: literature review and applications. Int J Bio-Inspired Comput 5(3):141–149

    Article  Google Scholar 

  30. Rodrigues D, Pereira LAM, Nakamura RYM et al (2014) A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest. Expert Syst Appl 41(5):2250–2258

    Article  Google Scholar 

  31. Yang X-S, Deb S (2009) Cuckoo search via Lévy flights. Paper Presented at 2009 World Congress on Nature and Biologically Inspired Computing (NaBIC), Coimbatore, India

  32. Mohapatra P, Chakravarty S, Dash PK (2015) An improved cuckoo search based extreme learning machine for medical data classification. Swarm Evol Comput 24:25–49

    Article  Google Scholar 

  33. Unler A, Murat A, Chinnam RB (2011) mr2PSO: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf Sci 181(20):4625–4641

    Article  Google Scholar 

  34. Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240

    Article  Google Scholar 

  35. Lin S-W, Lee Z-J, Chen S-C, Tseng T-Y (2008) Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl Soft Comput 8(4):1505–1512

    Article  Google Scholar 

  36. Ouaarab A, Ahiod B, Yang X-S (2014) Discrete cuckoo search algorithm for the travelling salesman problem. Neural Comput Appl 24(7–8):1659–1669

    Article  Google Scholar 

  37. Chen Y-P, Li Y, Wang G (2017) A novel bacterial foraging optimization algorithm for feature selection. Expert Syst Appl 83(C):1–17

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Science Foundation of China (NSFC) under Grant No. 61602206.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Y., Li, Y., Wang, G. et al. A hybrid feature selection algorithm for microarray data. J Supercomput 76, 3494–3526 (2020). https://doi.org/10.1007/s11227-018-2640-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2640-y

Keywords

Navigation