Skip to main content
Log in

Feature selection method based on hybrid data transformation and binary binomial cuckoo search

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Feature selection is one of the key components of data mining and machine learning domain that selects the best subset of features with respect to target data by removing irrelevant data. However, it is a complex task to select optimal set of features from a dataset using traditional feature selection methods, as for n number of features, \(2^n\) feature subsets are possible. Therefore, this paper introduces a novel metaheuristics-based feature selection method based binomial cuckoo search. Generally, metaheuristics-based feature selection methods suffer with stability issue since they select different set of features in different runs. Hence, to deal with stability issue, a hybrid data transformation method based on principal component analysis and fast independent component analysis has also been introduced. The proposed hybrid data transformation method first transforms the original data thereafter proposed binary binomial cuckoo search method is used to elect the best subset of features. The proposed feature selection method maximizes the classification accuracy and minimizes the number of selected features. The performance of the proposed method has been tested on the fourteen feature selection benchmark datasets taken from UCI repository and compared with other latest state-of- the art approaches including binary cuckoo search, binary bat algorithm, binary gravitational search algorithm, binary whale optimization with simulated annealing, and binary grey wolf optimization. Further, statistical analysis has also been carried out to validate the efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Barani F, Mirhosseini M, Nezamabadi-Pour H (2017) Application of binary quantum-inspired gravitational search algorithm in feature subset selection. Appl Intell 47(2):304–318

    Google Scholar 

  • Bartolo N, Komatsu E, Matarrese S, Riotto A (2004) Non-gaussianity from inflation: theory and observations. Phys Rep 402(3–4):103–266

    MathSciNet  Google Scholar 

  • Bugli C, Lambert P (2007) Comparison between principal component analysis and independent component analysis in electroencephalograms modelling. Biom J 49(2):312–327

    MathSciNet  Google Scholar 

  • Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28

    Google Scholar 

  • Chattopadhyay AK, Mondal S, Biswas A (2015) Independent component analysis and clustering for pollution data. Environ Ecol Stat 22(1):33–43

    MathSciNet  Google Scholar 

  • Chen LH, Hsiao HD (2008) Feature selection to diagnose a business crisis by using a real ga-based support vector machine: an empirical study. Expert Syst Appl 35:1145–1155

    Google Scholar 

  • Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary pso for feature selection using gene expression data. Comput Biol Chem 32:29–38

    MATH  Google Scholar 

  • Da Silva SF, Ribeiro MX, Neto JdEB, Traina-Jr C, Traina AJ (2011) Improving the ranking quality of medical image retrieval using a genetic feature selection method. Decis Support Syst 51:810–820

    Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1–4):131–156

    Google Scholar 

  • Derrac J, García S, Herrera F (2009) A first study on the use of coevolutionary algorithms for instance and feature selection. In: Corchado E, Wu X, Oja E, Herrero Á, Baruque B (eds) International conference on hybrid artificial intelligence systems, vol 5572. pp. 557–564, Springer, Berlin, Heidelberg

    Google Scholar 

  • Douglas SC (2005) Fixed-point fastica algorithms for the blind separation of complex-valued signal mixtures. In: Proceedings of 39th Asilomar conference signals, systems, and computers

  • Du L, Shen YD (2015) Unsupervised feature selection with adaptive structure learning. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 209–218

  • Emary E, Zawbaa HM, Hassanien AE (2016a) Binary ant lion approaches for feature selection. Neurocomputing 213:54–65

    Google Scholar 

  • Emary E, Zawbaa HM, Hassanien AE (2016b) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381

    Google Scholar 

  • Feature selection dataset. https://archive.ics.uci.edu/ml/datasets.html. Accessed 10 May 2017

  • Feng F, Li X (2018) Application of improved chaos theory genetic multi feature matching algorithm in patent retrieval. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1009-7

    Article  Google Scholar 

  • Freeman C, Kulić D, Basir O (2015) An evaluation of classifier-specific filter measure performance for feature selection. Pattern Recogn 48:1812–1826

    Google Scholar 

  • Ghosh A, Datta A, Ghosh S (2013) Self-adaptive differential evolution for feature selection in hyperspectral image data. Appl Soft Comput 13:1969–1977

    Google Scholar 

  • Gu S, Cheng R, Jin Y (2016) Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput 22:811–822

    Google Scholar 

  • Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Zhou X (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26:252–264

    MathSciNet  Google Scholar 

  • Haykin S, Chen Z (2005) The cocktail party problem. Neural Comput 17(9):1875–1902

    Google Scholar 

  • He R, Tan T, Wang L, Zheng WS (2012) l 2, 1 regularized correntropy for robust feature selection. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, IEEE, pp 2504–2511

  • Hyvärinen A (2015) Independent component analysis of images. Encycl Comput Neurosci 1427–1430

  • Ibrahim RA, Ewees AA, Oliva D, Elaziz MA, Lu S (2018) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1031-9

    Article  Google Scholar 

  • Jayaraman V, Sultana HP (2019) Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification. J Ambient Intell Humaniz Comput 1–10

  • Jolliffe IT (1986) Principal component analysis and factor analysis. In: Principal component analysis. Springer, New York, pp 115–128

    Google Scholar 

  • Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans R Soc A 374(2065):20150202

    MathSciNet  MATH  Google Scholar 

  • Joyce JM (2011) Kullback–leibler divergence. In: International encyclopedia of statistical science. Springer, New York, pp 720–722

    Google Scholar 

  • Kang M, Islam MR, Kim J, Kim JM, Pecht M (2016) A hybrid feature selection scheme for reducing diagnostic performance deterioration caused by outliers in data-driven diagnostics. IEEE Trans Ind Electron 63:3299–3310

    Google Scholar 

  • Ke L, Feng Z, Ren Z (2008) An efficient ant colony optimization approach to attribute reduction in rough set theory. Pattern Recogn Lett 29:1351–1357

    Google Scholar 

  • Ke L, Feng Z, Xu Z, Shang K, Wang Y (2010) A multiobjective aco algorithm for rough feature selection. In: Circuits, communications and system (PACCS), 2010 second Pacific-Asia conference on, IEEE, vol 1, pp 207–210

  • Khushaba RN, Al-Ani A, AlSukker A, Al-Jumaily A (2008) A combined ant colony and differential evolution feature selection algorithm. In: International conference on ant colony optimization and swarm intelligence, Springer, New York, pp 1–12

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    MATH  Google Scholar 

  • Kulhari A, Pandey A, Pal R, Mittal H (2016) Unsupervised data classification using modified cuckoo search method. In: Contemporary computing (IC3), 2016 ninth international conference on, IEEE, pp 1–5

  • Kulshestha G, Agarwal A, Mittal A, Sahoo A (2015) Hybrid cuckoo search algorithm for simultaneous feature and classifier selection. In: Cognitive computing and information processing (CCIP), 2015 international conference on, IEEE, pp 1–6

  • Lane MC, Xue B, Liu I, Zhang M (2013) Particle swarm optimisation and statistical clustering for feature selection. In: Australasian conference on artificial intelligence, Springer, New York, pp 214–220

    Google Scholar 

  • Lane MC, Xue B, Liu I, Zhang M (2014) Gaussian based particle swarm optimisation and statistical clustering for feature selection. In: European conference on evolutionary computation in combinatorial optimization, Springer, New York, pp 133–144

    Google Scholar 

  • Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9:1106–1119

    Google Scholar 

  • Lee LC, Liong CY, Osman K, Jemain AA (2016) Comparison of several variants of principal component analysis (pca) on forensic analysis of paper based on ir spectrum. In: AIP conference proceedings, AIP Publishing, vol 1750, p 060012

  • Liu Y, Tang F, Zeng Z (2015) Feature selection based on dependency margin. IEEE Trans Cybern 45:1209–1221

    Google Scholar 

  • Lopez-Paz D, Sra S, Smola A, Ghahramani Z, Schölkopf B (2014) Randomized nonlinear component analysis. In: International conference on machine learning, pp 1359–1367

  • Mafarja M, Aljarah I, Heidari AA, Faris H, Fournier-Viger P, Li X, Mirjalili S (2018) Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowl Based Syst 117:267–286

    Google Scholar 

  • Mafarja M, Aljarah I, Faris H, Hammouri AI, Ala’M AZ, Mirjalili S (2019) Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst Appl 117:267–286

    Google Scholar 

  • Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312

    Google Scholar 

  • McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. Am Stat 32:12–16

    Google Scholar 

  • Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B Cybern 36:106–117

    Google Scholar 

  • Nematzadeh H, Enayatifar R, Mahmud M, Akbari E (2019) Frequency based feature selection method using whale algorithm. Genomics. https://doi.org/10.1016/j.ygeno.2019.01.006

    Article  Google Scholar 

  • Neshatian K, Zhang M (2009) Dimensionality reduction in face detection: A genetic programming approach. In: Image and vision computing New Zealand, 2009. IVCNZ’09. 24th international conference, IEEE, pp 391–396

  • Novey M, Adali T (2008) Complex ica by negentropy maximization. IEEE Trans Neural Netw 19(4):596–609

    Google Scholar 

  • O’Boyle NM, Palmer DS, Nigsch F, Mitchell JB (2008) Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction. Chem Cent J 2:21

    Google Scholar 

  • Oja E, Yuan Z (2006) The fastica algorithm revisited: convergence analysis. IEEE Trans Neural Netw 17(6):1370–1381

    Google Scholar 

  • Pandey AC, Kulhari A (2018) Semi-supervised spatiotemporal classification and trend analysis of satellite images. In: Advances in computer and computational sciences, Springer, New York, pp 353–363

    Google Scholar 

  • Pandey AC, Rajpoot DS (2019a) Feature selection method based on grey wolf optimization and simulated annealing. Recent Pat Comput Sci. https://doi.org/10.2174/2213275912666190408111828

    Article  Google Scholar 

  • Pandey AC, Rajpoot DS (2019b) Spam review detection using spiral cuckoo search clustering method. Evolut Intell 1–18

  • Pandey AC, Rajpoot DS, Saraswat M (2016) Data clustering using hybrid improved cuckoo search method. In: Contemporary computing (IC3), 2016 ninth international conference on, IEEE, pp 1–6

  • Pandey AC, Rajpoot DS, Saraswat M (2017a) Hybrid step size based cuckoo search. In: 2017 tenth international conference on contemporary computing (IC3), IEEE, pp 1–6

  • Pandey AC, Rajpoot DS, Saraswat M (2017b) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manag 53(4):764–779

    Google Scholar 

  • Pandey AC, Pal R, Kulhari A (2018) Unsupervised data classification using improved biogeography based optimization. Int J Syst Assur Eng Manag 9(4):821–829

    Google Scholar 

  • Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106

    Google Scholar 

  • Pavlyukevich I (2007) Lévy flights, non-local search and simulated annealing. J Comput Phys 226:1830–1844

    MathSciNet  MATH  Google Scholar 

  • Payne RB, Sorensen MD (2005) The cuckoos, vol 15. Oxford University Press, Oxford

    Google Scholar 

  • Pena JM, Nilsson R (2010) On the complexity of discrete feature selection for optimal classification. IEEE Trans Pattern Anal Mach Intell 32:1517–1522

    Google Scholar 

  • Prabukumar M, Agilandeeswari L, Ganesan K (2017) An intelligent lung cancer diagnosis system using cuckoo search optimization and support vector machine classifier. J Ambient Intell Humaniz Comput 10(1):267–293

    Google Scholar 

  • Rodrigues D, Pereira LA, Almeida T, Papa JP, Souza A, Ramos CC, Yang XS (2013) Bcs: a binary cuckoo search algorithm for feature selection. In: 2013 IEEE international symposium on circuits and systems (ISCAS2013), IEEE, pp 465–468

  • Rodrigues D, Pereira LA, Nakamura RY, Costa KA, Yang XS, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst Appl 41(5):2250–2258

    Google Scholar 

  • Rokhlin V, Szlam A, Tygert M (2009) A randomized algorithm for principal component analysis. SIAM J Matrix Anal Appl 31(3):1100–1124

    MathSciNet  MATH  Google Scholar 

  • Saraswat M, Arya K (2014) Feature selection and classification of leukocytes using random forest. Med Biol Eng Comput 52(12):1041–1052

    Google Scholar 

  • Sayed GI, Khoriba G, Haggag MH (2018) A novel chaotic salp swarm algorithm for global optimization and feature selection. Appl Intell 48(10):3462–3481

    Google Scholar 

  • Sayed GI, Hassanien AE, Azar AT (2019) Feature selection via a novel chaotic crow search algorithm. Neural Comput Appl 31(1):171–188

    Google Scholar 

  • Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid). Swarm Evolut Comput 36:27–36

    Google Scholar 

  • Simon D (2008) Biogeography-based optimization. IEEE Trans Evolut Comput 12(6):702–713

    Google Scholar 

  • Tang B, Kay S, He H (2016) Toward optimal feature selection in naive bayes for text categorization. IEEE Trans Knowl Data Eng 28:2508–2521

    Google Scholar 

  • Tran B, Xue B, Zhang M (2014) Improved pso for feature selection on high-dimensional datasets. In: Asia-Pacific conference on simulated evolution and learning, Springer, New York, pp 503–515

    Google Scholar 

  • Tran B, Xue B, Zhang M (2016) Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput 8:3–15

    Google Scholar 

  • Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl Based Syst 24(7):1024–1032

    Google Scholar 

  • Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92

    Google Scholar 

  • Valian E, Mohanna S, Tavakoli S (2011) Improved cuckoo search algorithm for feedforward neural network training. Int J Artif Intell Appl 2:36–43

    Google Scholar 

  • Vieira SM, Sousa JM, Runkler TA (2010) Two cooperative ant colonies for feature selection using fuzzy models. Expert Syst Appl 37:2714–2723

    Google Scholar 

  • Wei J, Zhang R, Yu Z, Hu R, Tang J, Gui C, Yuan Y (2017) A bpso-svm algorithm based on memory renewal and enhanced mutation mechanisms for feature selection. Appl Soft Comput 58:176–192

    Google Scholar 

  • Wei-min L, Chein C, (2007) Variants of principal components analysis. In: Geoscience and remote sensing symposium, et al (2007) IGARSS 2007. IEEE international, IEEE, pp 1083–1086

  • Winkler SM, Affenzeller M, Jacak W, Stekel H (2011) Identification of cancer diagnosis estimation models using evolutionary algorithms: a case study for breast cancer, melanoma, and cancer in the respiratory system. In: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, ACM, pp 503–510

  • Wu Y, Liu B, Wu W, Lin Y, Yang C, Wang M (2018) Grading glioma by radiomics with feature selection based on mutual information. J Ambient Intell Humaniz Comput 9(5):1671–1682

    Google Scholar 

  • Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276

    Google Scholar 

  • Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20:606–626

    Google Scholar 

  • Yang CS, Chuang LY, Chen YJ, Yang CH (2008) Feature selection using memetic algorithms. In: Convergence and hybrid information technology, 2008. ICCIT’08. Third international conference on, IEEE, vol 1, pp 416–423

  • Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst Appl 13:44–49

    Google Scholar 

  • Yang XS (2014) Nature-inspired optimization algorithms. Elsevier, Amsterdam

    MATH  Google Scholar 

  • Yang XS, Deb S (2009) Cuckoo search via lévy flights. In: World congress on nature and biologically inspired computing, IEEE, pp 210–214

  • Yao F, Coquery J, Lê Cao KA (2012) Independent principal component analysis for biologically meaningful dimension reduction of large biological data sets. BMC Bioinform 13(1):24

    Google Scholar 

  • Zhu P, Zhu W, Hu Q, Zhang C, Zuo W (2017) Subspace clustering guided unsupervised feature selection. Pattern Recogn 66:364–374

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avinash Chandra Pandey.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pandey, A.C., Rajpoot, D.S. & Saraswat, M. Feature selection method based on hybrid data transformation and binary binomial cuckoo search. J Ambient Intell Human Comput 11, 719–738 (2020). https://doi.org/10.1007/s12652-019-01330-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-019-01330-1

Keywords

Navigation