Fast Gaussian kernel support vector machine recursive feature elimination algorithm

Zhang, Li; Zheng, Xiaohan; Pang, Qingqing; Zhou, Weida

doi:10.1007/s10489-021-02298-2

Fast Gaussian kernel support vector machine recursive feature elimination algorithm

Published: 20 April 2021

Volume 51, pages 9001–9014, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Li Zhang ORCID: orcid.org/0000-0001-7914-0679^1,2,
Xiaohan Zheng^1,2,
Qingqing Pang^1,2 &
…
Weida Zhou³

519 Accesses
7 Citations
Explore all metrics

Abstract

Gaussian kernel support vector machine recursive feature elimination (GKSVM-RFE) is a method for feature ranking in a nonlinear way. However, GKSVM-RFE suffers from the issue of high computational complexity, which hinders its applications. This paper investigates the issue of computational complexity in GKSVM-RFE, and proposes two fast versions for GKSVM-RFE, called fast GKSVM-RFE (FGKSVM-RFE), to speed up the procedure of recursive feature elimination in GKSVM-RFE. For this purpose, we design two kinds of ranking scores based on the first-order and second-order approximate schemes by introducing approximate Gaussian kernels. In iterations, FGKSVM-RFE fast calculates approximate ranking scores according to approximate schemes and ranks features based on approximate ranking scores. Experimental results reveal that our proposed methods can faster perform feature ranking than GKSVM-RFE and have compared performance to GKSVM-RFE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Selecting critical features for data classification based on machine learning methods

Article Open access 23 July 2020

A review of unsupervised feature selection methods

Article 29 January 2019

References

Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(6):1157–1182
MATH Google Scholar
Huang X, Zhang L, Wang B, Zhang Z, Li F (2018) Feature weight estimation based on dynamic representation and neighbor sparse reconstruction. Pattern Recogn 81(9):388–403
Article Google Scholar
Kira K, Rendell LA (1992) The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on Artificial Intelligence, vol 2, pp 129–134
Kononenko I (1994) Estimating attributes: Analysis and extensions of RELIEF. In: Proceedings of European conference on machine learning. Springer, pp 171–182
Gu Q, Li Z, Han J (2011) Generalized Fisher score for feature selection. In: Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp 266–273
Wang C, Hu Q, Wang X, Chen D, Qian Y, Dong Z (2018) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst 29(7):2986–2999
MathSciNet Google Scholar
Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley
Zhang L, Zhou W (2010) On the sparseness of 1-norm support vector machines. Neural Netw 23(3):373–385
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapink V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422
Article Google Scholar
Duan KB, Rajapakse JC, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobiosci 4(3):228–234
Article Google Scholar
Zhou X, Tuck DP (2007) MSVM-RFE: Extensions of SVM-RFE for multi-class gene selection on DNA microarray data. Bioinformatics 23(9):1106–1114
Article Google Scholar
Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 53(2):381–389
Article Google Scholar
Zhang L, Huang X-J (2015) Multiple SVM-RFE for multi-class gene selection on DNA microarray data. In: Proceedings of 2015 International Joint Conference on Neural Networks, pp 897–902
Mao Y, Zhou XB, Dao-Ying PI, Sun YX, Wong STC (2005) Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm. J Zhejiang Univ Sci 6(10):961–973
Article Google Scholar
Mao Y, Zhou X, Yin Z, Pi D, Sun Y, Wong STC (2006) Gene selection using Gaussian kernel support vector machine based recursive feature elimination with adaptive kernel width strategy. In: Proceedings of First International Conference on Rough Sets and Knowledge Technology, Lecture Notes in Computer Science, vol 4062, pp 799–806
Shieh MD, Yang CC (2008) Multi-class SVM-RFE for product form feature selection. Expert Syst Appl 35:531–541
Article Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Vapnik VN (1999) The overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article Google Scholar
Koo I, Kil RM (2008) Model selection for regression with continuous kernel functions using the modulus of continuity. J Mach Learn Res 9(6):2607–2633
MathSciNet MATH Google Scholar
Varewyck M, Martens JP (2011) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B Cybern 41(2):330–340
Article Google Scholar
Xu Z, Dai M, Meng D (2009) Fast and efficient strategies for model selection of gaussian support vector machine. IEEE Trans Syst Man Cybern Part B: Cybern 39(5):1292–1307
Article Google Scholar
Zhang L, Zhou W-D, Chang P-C, Liu J, Yan Z, Wang T, Li F-Z (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695
Article MathSciNet Google Scholar
Zhou P, Li D, Wu H, Cheng F (2011) The automatic model selection and variable kernel width for RBF neural networks. Neurocomputing 74(17):3628–3637
Article Google Scholar
Cao H, Naito T, Ninomiya Y (2008) Approximate RBF kernel SVM and its applications in pedestrian classification. In: The 1st International Workshop on Machine Learning for Vision-based Motion Analysis
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, vol 20, pp 1177–1184
Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE Trans Pattern Anal Mach Intell 34(3):480–492
Article Google Scholar
Yang C, Duraiswami R, Davis LS (2005) Efficient kernel machines using the improved fast Gauss transform. In: Advances in Neural Information Processing Systems, vol 17, pp 1561–1568
Xiao C, Feiping N, Heng H, Chirs D (2011) Multi-class ℓ_2,1-norm support vector machine. In: 11th IEEE International Conference on Data Mining, pp 91–100
Jinglin X, Feiping N, Junwei H (2017) Feature selection via scaling factor intergrated multi-class support vector machines. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp 3168–3174
Zhang L, Zhou WD, Su TT, Jiao LC (2007) Decision tree support vector machine. Int J Artif Intell Tools 16(1):1–16
Article Google Scholar
Allwein E, Schapire R, Singer Y (2000) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141
MathSciNet MATH Google Scholar
Bredensteiner EJ, Bennett KP (1999) Multicategory classification by support vector machines. Comput Optim Appl 12(1-3):53–79
Article MathSciNet Google Scholar
Weston J, Watkins C (1999) Support vector machines for multiclass pattern recognition. In: Proceedings of the Seventh European Symposium On Artificial Neural Networks
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Cover TM, Thomas JA (2006) Elements of information theory. Wiley, New York
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar

Download references

Acknowledgements

Supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

School of Computer Science and Technology, Joint International Research Laboratory of Machine Learning and Neuromorphic Computing, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang, Xiaohan Zheng & Qingqing Pang
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang, Xiaohan Zheng & Qingqing Pang
AI Speech Ltd., Suzhou, 215123, Jiangsu, China
Weida Zhou

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Qingqing Pang
View author publications
You can also search for this author in PubMed Google Scholar
Weida Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Proof of Theorem 1

Proof

Let $\{\textbf {x}_{i},y_{i}\}_{i=1}^{n}$ be the training set, and S be the index set of candidate features. Given the solution α_i,i = 1,⋯ ,n to (4) and the Gaussian kernel parameter γ, we can calculate the first-order approximate ranking score $\tilde {c}^{(1)}_{m}$ for feature m using the first-order approximate Gaussian kernel (11). Namely,

$$ \begin{array}{@{}rcl@{}} \tilde{c}^{(1)}_{m} &=& \frac{1}{2}\left|\boldsymbol{\alpha}^{T}\tilde{\textbf{H}}_{1}^{S}\boldsymbol{\alpha}-\boldsymbol{\alpha}^{T}\tilde{\textbf{H}}_{1}^{S-\{m\}}\boldsymbol{\alpha}\right| \\ ~ &=& \frac{1}{2}\left|\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \left( \alpha_{i} \alpha_{j} y_{i} y_{j} \tilde{K}^{S}_{ij}-\alpha_{i} \alpha_{j} y_{i} y_{j} \tilde{K}_{ij}^{S-\{m\}}\right) \right| \end{array} $$

(21)

where $\tilde {K}^{S}_{ij}=\tilde {k}_{1}\left (\textbf {x}^{S}_{i},\textbf {x}^{S}_{j}\right )$ and $\tilde {K}_{ij}^{S-\{m\}}=\tilde {k}_{1}\left (\textbf {x}^{S-\{m\}}_{i},\textbf {x}^{S-\{m\}}_{j}\right )$.

According to the absolute value inequality theorem, we bound $\tilde {c}^{(1)}_{m}$ as follows:

$$ \begin{array}{@{}rcl@{}} \tilde{c}^{(1)}_{m} & \leq & \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \left|\alpha_{i} \alpha_{j} y_{i} y_{j} \left( \tilde{K}^{S}_{ij}-\tilde{K}_{ij}^{S-\{m\}}\right) \right| \end{array} $$

(22)

$$ \begin{array}{@{}rcl@{}} &\leq& \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left| \tilde{K}^{S}_{ij}-\tilde{K}_{ij}^{S-\{m\}}\right| \end{array} $$

(23)

$$ \begin{array}{@{}rcl@{}} &\leq & \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left( \left| \tilde{K}^{S}_{ij}\right|+\left|\tilde{K}_{ij}^{S-\{m\}}\right|\right) \end{array} $$

(24)

Further, we can expand the approximate kernel function by

$$ \begin{array}{@{}rcl@{}} \left|\tilde{K}_{ij}^{S}\right|&=&\left|e^{-\gamma\left( \|\overline{\textbf{x}}_{i}\|_{2}^{2}+\|\overline{\textbf{x}}_{j}\|_{2}^{2}\right)} \left( 1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right)\right| \end{array} $$

(25)

$$ \begin{array}{@{}rcl@{}} &\leq& \left| e^{-\gamma\left( \|\overline{\textbf{x}}_{i}\|_{2}^{2}+\|\overline{\textbf{x}}_{j}\|_{2}^{2}\right)}\right| \left| 1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right| \end{array} $$

(26)

$$ \begin{array}{@{}rcl@{}} &\leq& \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right| \end{array} $$

(27)

where $\overline {\mathbf {x}}_{i}$ denotes x_i with features in S, or $\overline {\mathbf {x}}_{i}=\mathbf {x}^{S}_{i}$.

Similarly, we can have the upper bound for $\left |\tilde {K}^{S-\{m\}}_{ij}\right |$. Namely,

$$ \begin{array}{@{}rcl@{}} \left|\tilde{K}^{S-\{m\}}_{ij}\right|\leq \left|1+2\gamma\left( \overline{\textbf{x}}^{(-m)}_{i}\right)^{T}\overline{\textbf{x}}^{(-m)}_{j}\right| \end{array} $$

(28)

where $\overline {\mathbf {x}}^{(-m)}_{i}$ denotes x_i with all features in (S −{m}),

We can rewrite $\left (\overline {\textbf {x}}^{(-m)}_{i}\right )^{T}\left (\overline {\textbf {x}}^{(-m)}_{j}\right )$ in (28) as follows.

$$ \left( \overline{\textbf{x}}^{(-m)}_{i}\right)^{T}\left( \overline{\textbf{x}}^{(-m)}_{j}\right)=\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}-{x_{i}^{m}} {x_{j}^{m}} $$

(29)

Substituting (29) into (28), we get

$$ \begin{array}{@{}rcl@{}} \left|\tilde{K}^{S-\{m\}}_{ij}\right|&\leq& \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}-2\gamma {x_{i}^{m}} {x_{j}^{m}} \right| \end{array} $$

(30)

$$ \begin{array}{@{}rcl@{}} &\leq& \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right|+\left|2\gamma {x_{i}^{m}} {x_{j}^{m}} \right| \end{array} $$

(31)

Substituting (27) and (31) into (24), we can obtain

$$ \begin{array}{@{}rcl@{}} \tilde{c}^{(1)}_{m} &\leq & \frac{1}{2}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left( 2\left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right|+\left|2\gamma {x_{i}^{m}} {x_{j}^{m}} \right|\right) \end{array} $$

(32)

$$ \begin{array}{@{}rcl@{}} &=&\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} \alpha_{i} \alpha_{j} \left( \left|1+2\gamma\overline{\textbf{x}}_{i}^{T}\overline{\textbf{x}}_{j}\right|+\left|\gamma {x_{i}^{m}} {x_{j}^{m}} \right|\right) \end{array} $$

(33)

which completes the proof of this theorem. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Zheng, X., Pang, Q. et al. Fast Gaussian kernel support vector machine recursive feature elimination algorithm. Appl Intell 51, 9001–9014 (2021). https://doi.org/10.1007/s10489-021-02298-2

Download citation

Accepted: 01 March 2021
Published: 20 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10489-021-02298-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Gaussian kernel support vector machine recursive feature elimination algorithm

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

Selecting critical features for data classification based on machine learning methods

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix

Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast Gaussian kernel support vector machine recursive feature elimination algorithm

Abstract

Access this article

Similar content being viewed by others

Feature selection techniques for machine learning: a survey of more than two decades of research

Selecting critical features for data classification based on machine learning methods

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix

Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation