Skip to main content
Log in

Fast Gaussian kernel support vector machine recursive feature elimination algorithm

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Gaussian kernel support vector machine recursive feature elimination (GKSVM-RFE) is a method for feature ranking in a nonlinear way. However, GKSVM-RFE suffers from the issue of high computational complexity, which hinders its applications. This paper investigates the issue of computational complexity in GKSVM-RFE, and proposes two fast versions for GKSVM-RFE, called fast GKSVM-RFE (FGKSVM-RFE), to speed up the procedure of recursive feature elimination in GKSVM-RFE. For this purpose, we design two kinds of ranking scores based on the first-order and second-order approximate schemes by introducing approximate Gaussian kernels. In iterations, FGKSVM-RFE fast calculates approximate ranking scores according to approximate schemes and ranks features based on approximate ranking scores. Experimental results reveal that our proposed methods can faster perform feature ranking than GKSVM-RFE and have compared performance to GKSVM-RFE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(6):1157–1182

    MATH  Google Scholar 

  2. Huang X, Zhang L, Wang B, Zhang Z, Li F (2018) Feature weight estimation based on dynamic representation and neighbor sparse reconstruction. Pattern Recogn 81(9):388–403

    Article  Google Scholar 

  3. Kira K, Rendell LA (1992) The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on Artificial Intelligence, vol 2, pp 129–134

  4. Kononenko I (1994) Estimating attributes: Analysis and extensions of RELIEF. In: Proceedings of European conference on machine learning. Springer, pp 171–182

  5. Gu Q, Li Z, Han J (2011) Generalized Fisher score for feature selection. In: Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp 266–273

  6. Wang C, Hu Q, Wang X, Chen D, Qian Y, Dong Z (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999

    MathSciNet  Google Scholar 

  7. Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley

  8. Zhang L, Zhou W (2010) On the sparseness of 1-norm support vector machines. Neural Netw 23(3):373–385

    Article  Google Scholar 

  9. Guyon I, Weston J, Barnhill S, Vapink V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422

    Article  Google Scholar 

  10. Duan KB, Rajapakse JC, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobiosci 4(3):228–234

    Article  Google Scholar 

  11. Zhou X, Tuck DP (2007) MSVM-RFE: Extensions of SVM-RFE for multi-class gene selection on DNA microarray data. Bioinformatics 23(9):1106–1114

    Article  Google Scholar 

  12. Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 53(2):381–389

    Article  Google Scholar 

  13. Zhang L, Huang X-J (2015) Multiple SVM-RFE for multi-class gene selection on DNA microarray data. In: Proceedings of 2015 International Joint Conference on Neural Networks, pp 897–902

  14. Mao Y, Zhou XB, Dao-Ying PI, Sun YX, Wong STC (2005) Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm. J Zhejiang Univ Sci 6(10):961–973

    Article  Google Scholar 

  15. Mao Y, Zhou X, Yin Z, Pi D, Sun Y, Wong STC (2006) Gene selection using Gaussian kernel support vector machine based recursive feature elimination with adaptive kernel width strategy. In: Proceedings of First International Conference on Rough Sets and Knowledge Technology, Lecture Notes in Computer Science, vol 4062, pp 799–806

  16. Shieh MD, Yang CC (2008) Multi-class SVM-RFE for product form feature selection. Expert Syst Appl 35:531–541

    Article  Google Scholar 

  17. Vapnik VN (1998) Statistical learning theory. Wiley, New York

  18. Vapnik VN (1999) The overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999

    Article  Google Scholar 

  19. Koo I, Kil RM (2008) Model selection for regression with continuous kernel functions using the modulus of continuity. J Mach Learn Res 9(6):2607–2633

    MathSciNet  MATH  Google Scholar 

  20. Varewyck M, Martens JP (2011) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B Cybern 41(2):330–340

    Article  Google Scholar 

  21. Xu Z, Dai M, Meng D (2009) Fast and efficient strategies for model selection of gaussian support vector machine. IEEE Trans Syst Man Cybern Part B: Cybern 39(5):1292–1307

    Article  Google Scholar 

  22. Zhang L, Zhou W-D, Chang P-C, Liu J, Yan Z, Wang T, Li F-Z (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695

    Article  MathSciNet  Google Scholar 

  23. Zhou P, Li D, Wu H, Cheng F (2011) The automatic model selection and variable kernel width for RBF neural networks. Neurocomputing 74(17):3628–3637

    Article  Google Scholar 

  24. Cao H, Naito T, Ninomiya Y (2008) Approximate RBF kernel SVM and its applications in pedestrian classification. In: The 1st International Workshop on Machine Learning for Vision-based Motion Analysis

  25. Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, vol 20, pp 1177–1184

  26. Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE Trans Pattern Anal Mach Intell 34(3):480–492

    Article  Google Scholar 

  27. Yang C, Duraiswami R, Davis LS (2005) Efficient kernel machines using the improved fast Gauss transform. In: Advances in Neural Information Processing Systems, vol 17, pp 1561–1568

  28. Xiao C, Feiping N, Heng H, Chirs D (2011) Multi-class 2,1-norm support vector machine. In: 11th IEEE International Conference on Data Mining, pp 91–100

  29. Jinglin X, Feiping N, Junwei H (2017) Feature selection via scaling factor intergrated multi-class support vector machines. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp 3168–3174

  30. Zhang L, Zhou WD, Su TT, Jiao LC (2007) Decision tree support vector machine. Int J Artif Intell Tools 16(1):1–16

    Article  Google Scholar 

  31. Allwein E, Schapire R, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141

    MathSciNet  MATH  Google Scholar 

  32. Bredensteiner EJ, Bennett KP (1999) Multicategory classification by support vector machines. Comput Optim Appl 12(1-3):53–79

    Article  MathSciNet  Google Scholar 

  33. Weston J, Watkins C (1999) Support vector machines for multiclass pattern recognition. In: Proceedings of the Seventh European Symposium On Artificial Neural Networks

  34. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  35. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  36. Cover TM, Thomas JA (2006) Elements of information theory. Wiley, New York

  37. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

Download references

Acknowledgements

Supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Proof of Theorem 1

Proof

Let \(\{\textbf {x}_{i},y_{i}\}_{i=1}^{n}\) be the training set, and S be the index set of candidate features. Given the solution αi,i = 1,⋯ ,n to (4) and the Gaussian kernel parameter γ, we can calculate the first-order approximate ranking score \(\tilde {c}^{(1)}_{m}\) for feature m using the first-order approximate Gaussian kernel (11). Namely,

$$ \begin{array}{@{}rcl@{}} \tilde{c}^{(1)}_{m} &=& \frac{1}{2}\left|\boldsymbol{\alpha}^{T}\tilde{\textbf{H}}_{1}^{S}\boldsymbol{\alpha}-\boldsymbol{\alpha}^{T}\tilde{\textbf{H}}_{1}^{S-\{m\}}\boldsymbol{\alpha}\right| \\ ~ &=& \frac{1}{2}\left|\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \left( \alpha_{i} \alpha_{j} y_{i} y_{j} \tilde{K}^{S}_{ij}-\alpha_{i} \alpha_{j} y_{i} y_{j} \tilde{K}_{ij}^{S-\{m\}}\right) \right| \end{array} $$
(21)

where \(\tilde {K}^{S}_{ij}=\tilde {k}_{1}\left (\textbf {x}^{S}_{i},\textbf {x}^{S}_{j}\right )\) and \(\tilde {K}_{ij}^{S-\{m\}}=\tilde {k}_{1}\left (\textbf {x}^{S-\{m\}}_{i},\textbf {x}^{S-\{m\}}_{j}\right )\).

According to the absolute value inequality theorem, we bound \(\tilde {c}^{(1)}_{m}\) as follows:

$$ \begin{array}{@{}rcl@{}} \tilde{c}^{(1)}_{m} & \leq & \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \left|\alpha_{i} \alpha_{j} y_{i} y_{j} \left( \tilde{K}^{S}_{ij}-\tilde{K}_{ij}^{S-\{m\}}\right) \right| \end{array} $$
(22)
$$ \begin{array}{@{}rcl@{}} &\leq& \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left| \tilde{K}^{S}_{ij}-\tilde{K}_{ij}^{S-\{m\}}\right| \end{array} $$
(23)
$$ \begin{array}{@{}rcl@{}} &\leq & \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left( \left| \tilde{K}^{S}_{ij}\right|+\left|\tilde{K}_{ij}^{S-\{m\}}\right|\right) \end{array} $$
(24)

Further, we can expand the approximate kernel function by

$$ \begin{array}{@{}rcl@{}} \left|\tilde{K}_{ij}^{S}\right|&=&\left|e^{-\gamma\left( \|\overline{\textbf{x}}_{i}\|_{2}^{2}+\|\overline{\textbf{x}}_{j}\|_{2}^{2}\right)} \left( 1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right)\right| \end{array} $$
(25)
$$ \begin{array}{@{}rcl@{}} &\leq& \left| e^{-\gamma\left( \|\overline{\textbf{x}}_{i}\|_{2}^{2}+\|\overline{\textbf{x}}_{j}\|_{2}^{2}\right)}\right| \left| 1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right| \end{array} $$
(26)
$$ \begin{array}{@{}rcl@{}} &\leq& \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right| \end{array} $$
(27)

where \(\overline {\mathbf {x}}_{i}\) denotes xi with features in S, or \(\overline {\mathbf {x}}_{i}=\mathbf {x}^{S}_{i}\).

Similarly, we can have the upper bound for \(\left |\tilde {K}^{S-\{m\}}_{ij}\right |\). Namely,

$$ \begin{array}{@{}rcl@{}} \left|\tilde{K}^{S-\{m\}}_{ij}\right|\leq \left|1+2\gamma\left( \overline{\textbf{x}}^{(-m)}_{i}\right)^{T}\overline{\textbf{x}}^{(-m)}_{j}\right| \end{array} $$
(28)

where \(\overline {\mathbf {x}}^{(-m)}_{i}\) denotes xi with all features in (S −{m}),

We can rewrite \(\left (\overline {\textbf {x}}^{(-m)}_{i}\right )^{T}\left (\overline {\textbf {x}}^{(-m)}_{j}\right )\) in (28) as follows.

$$ \left( \overline{\textbf{x}}^{(-m)}_{i}\right)^{T}\left( \overline{\textbf{x}}^{(-m)}_{j}\right)=\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}-{x_{i}^{m}} {x_{j}^{m}} $$
(29)

Substituting (29) into (28), we get

$$ \begin{array}{@{}rcl@{}} \left|\tilde{K}^{S-\{m\}}_{ij}\right|&\leq& \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}-2\gamma {x_{i}^{m}} {x_{j}^{m}} \right| \end{array} $$
(30)
$$ \begin{array}{@{}rcl@{}} &\leq& \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right|+\left|2\gamma {x_{i}^{m}} {x_{j}^{m}} \right| \end{array} $$
(31)

Substituting (27) and (31) into (24), we can obtain

$$ \begin{array}{@{}rcl@{}} \tilde{c}^{(1)}_{m} &\leq & \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left( 2\left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right|+\left|2\gamma {x_{i}^{m}} {x_{j}^{m}} \right|\right) \end{array} $$
(32)
$$ \begin{array}{@{}rcl@{}} &=&\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left( \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right|+\left|\gamma {x_{i}^{m}} {x_{j}^{m}} \right|\right) \end{array} $$
(33)

which completes the proof of this theorem. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Zheng, X., Pang, Q. et al. Fast Gaussian kernel support vector machine recursive feature elimination algorithm. Appl Intell 51, 9001–9014 (2021). https://doi.org/10.1007/s10489-021-02298-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02298-2

Keywords

Navigation