New fast feature selection methods based on multiple support vector data description

Zhang, Li; Lu, Xingning

doi:10.1007/s10489-017-1054-5

New fast feature selection methods based on multiple support vector data description

Published: 07 September 2017

Volume 48, pages 1776–1790, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

418 Accesses
4 Citations
Explore all metrics

Abstract

Feature selection can sort out useful features to obtain good performance when dealing with high-dimensional data. Feature selection methods based on support vector data description (SVDD) have been proposed for one-class classification problems: SVDD-radius-recursive feature elimination (SVDD-RRFE) and SVDD-dual-objective-recursive feature elimination (SVDD-DRFE). However, both SVDD-RRFE and SVDD-DRFE use only one-class samples even given a multi-class classification task, and suffer from high computational complexity. To remedy it, this paper extends both SVDD-RRFE and SVDD-DRFE to binary and multi-class classification problems using multiple SVDD models, and proposes fast feature ranking schemes for them in the case of the linear kernel. Experimental results on toy, UCI and microarray datasets show the efficiency and the feasibility of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained class-wise feature selection (CCFS)

Article 20 June 2022

Syed Fawad Hussain, Fatima Shahzadi & Badre Munir

Feature subset selection combining maximal information entropy and maximal information coefficient

Article 29 July 2019

Kangfeng Zheng, Xiujuan Wang, … Tong Wu

Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data

Article 19 December 2018

Adrian Pino Angulo & Kilho Shin

References

Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M (2001) Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc Nat Acad Sci 98(24):13,790–13,795
Article Google Scholar
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97 (1-2):245–271
Article MathSciNet MATH Google Scholar
Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inf 53:381–389
Article Google Scholar
Chen H, Yang B, Liu J, Liu D (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022
Article Google Scholar
Daelemans W, Goethals B, Morik K (eds) (2008) Machine learning and knowledge discovery in databases, european conference, ECML/PKDD 2008. In: Proceedings, part II, lecture notes in computer science, vol 5212. Springer, Antwerp
Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1):131–156
Article Google Scholar
Demṡar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Amer Stat Assoc 56(293):52–64
Article MathSciNet MATH Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository from http://archive.ics.uci.edu/ml.html
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Amer Stat Assoc 32(200):675–701
Article MATH Google Scholar
Geller SC, Gregg JP, Hagerman P, Rocke DM (2003) Transformation and normalization of oligonucleotide microarray data. Bioinformatics 19(14):1817–1823
Article Google Scholar
Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43 (1):5–13
Article MATH Google Scholar
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5436):531–537
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Machine Learn 46(1-3):389–422
Article MATH Google Scholar
Hermes L, Buhmann JM (2000) Feature selection for support vector machines. In: 15th International Conference on Pattern Recognition, ICPR’00, Spain, pp 2712–2715
Huang C, Dun J (2008) A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391
Article Google Scholar
Jeong Y, Kang I, Jeong MK, Kong D (2012) A new feature selection method for one-class classification problems. IEEE Trans Syst Man, Cybern Part C 42(6):1500–1509
Article Google Scholar
Khan J, Wei JS, Ringné M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Article Google Scholar
Kittler J (1986) Feature selection and extraction. In: Handbook of Pattern Recognition and Image Processing. Orlando, FL: Academic Press, pp 59–83
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324
Article MATH Google Scholar
Lashkia GV, Anthony L (2004) Relevant, irredundant feature selection and noisy example elimination. IEEE Trans Syst Man, Cybern Part B 34(2):888–897
Article Google Scholar
Leardi R, Nørgaard L (2004) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemometr 18(11):486–497
Article Google Scholar
Lee D, Lee J (2007) Domain described support vector classifier for multi-classification problems. Pattern Recogn 40(1):41–51
Article MATH Google Scholar
Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128
Article Google Scholar
Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442
Article Google Scholar
Shao L, Liu L, Li X (2014) Feature learning for image classification via multiobjective genetic programming. IEEE Trans Neural Netw Learn Syst 25(7):1359–1371
Article Google Scholar
Shieh M, Yang C (2008) Multiclass SVM-RFE for product form feature selection. Expert Syst Appl 35 (1-2):531–541
Article Google Scholar
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Article MATH Google Scholar
Tayal A, Coleman TF, Li Y (2014) Primal explicit max margin feature selection for nonlinear support vector machines. Pattern Recogn 47(6):2153–2164
Article MATH Google Scholar
Wang J, Shan G, Zhang Q, DUAN X (2011) Research on feature selection method based on improved SVM-RFE. Microcomput Appl 32(2):70–74
Google Scholar
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio TA, Vapnik V (2000) Feature selection for svms. In: Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, USA, pp 668–674
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE T Cybern 43(6):1656–1671
Article Google Scholar
Yang J, Ong CJ (2012) An effective feature selection method via mutual information estimation. IEEE Trans Syst Man, Cybern Part B 42(6):1550–1559
Article Google Scholar
Yang W, Gao Y, Shi Y, Cao L (2015) MRM-Lasso: A sparse multiview feature selection method via low-rank analysis. IEEE Trans Neural Netw Learn Syst 26(11):2801–2815
Article MathSciNet Google Scholar
Zhou X, Tuck DP (2007) MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9):1106–1114
Article Google Scholar
Zhu Z, Ong Y, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61373093, by the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20140008, and by the Soochow Scholar Project.

Author information

Authors and Affiliations

School of Computer Science and Technology, Joint International Research Laboratory of Machine Learning and Neuromorphic Computing, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang & Xingning Lu
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210023, Jiangsu, China
Li Zhang

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xingning Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Appendix

1.1 A. Proof of Theorem 1

Proof

Assume that a linear SVDD model has been trained. Then we can get the center a of the hypersphere, and the set of support vectors SV . Substituting J _r(6) and ${J_{r}^{k}}$(7) into the radius ranking score J R _k,we have

$$\begin{array}{@{}rcl@{}} JR_{k}&=& J_{r}-{J_{r}^{k}} \\ &=& \sum\limits_{{{\textbf{x}}_{sv}} \in {{SV}}} {\frac{{{R^{2}}\left( {{{\textbf{x}}_{sv}}} \right)}}{{\left| {SV} \right|}}}- \sum\limits_{{{\textbf{x}}_{sv}} \in {{SV}}} {\frac{{{R^{2}}\left( {{{\textbf{x}}^{k}_{sv}}} \right)}}{{\left| {SV} \right|}}} \end{array} $$

(31)

where ${\textbf {x}}_{sv}\in \mathbb {R}^{D}$is a supportvector, and ${\textbf {x}}^{k}_{sv}=[x_{(sv,1)},\cdots ,x_{(sv,k-1)},x_{(sv,k+1)},\cdots ,x_{(sv,D)}]^{T} \in \mathbb {R}^{D-1}$.

According to (31), it is necessary to find the difference $R^{2}({\textbf {x}}_{sv})-R^{2}({\textbf {x}}^{k}_{sv})$.Let $\textbf {a}^{k}=[a_{1},\cdots ,a_{k-1},a_{k+1},\cdots ,a_{D}] \in \mathbb {R}^{D-1}$.Then, we have

$$\begin{array}{@{}rcl@{}} &&R^{2}({\textbf{x}}_{sv})-R^{2}({\textbf{x}}^{k}_{sv})\\ &=&\|{\textbf{x}}_{sv}-\textbf{a}\|^{2}-\|{\textbf{x}}^{k}_{sv}-\textbf{a}^{k}\|^{2} \\ &=& {\textbf{x}}^{T}_{sv}{\textbf{x}}_{sv}-2{\textbf{x}}^{T}_{sv}{\textbf{a}} +\textbf{a}^{T}\textbf{a}-\\ &&\left( ({\textbf{x}}^{k}_{sv})^{T}{\textbf{x}}^{k}_{sv}-2({\textbf{x}}^{k}_{sv})^{T}{\textbf{a}^{k}} +(\textbf{a}^{k})^{T}\textbf{a}^{k} \right) \end{array} $$

(32)

Since

$$\begin{array}{@{}rcl@{}} {\textbf{x}}^{T}_{sv}{\textbf{x}}_{sv}-({\textbf{x}}^{k}_{sv})^{T}{\textbf{x}}^{k}_{sv}=x^{2}_{(sv,k)} \end{array} $$

(33)

$$\begin{array}{@{}rcl@{}} {\textbf{x}}^{T}_{sv}{\textbf{a}}-({\textbf{x}}^{k}_{sv})^{T}{\textbf{a}^{k}}=x_{(sv,k)}a_{k} \end{array} $$

(34)

and

$$\begin{array}{@{}rcl@{}} \textbf{a}^{T}\textbf{a}-(\textbf{a}^{k})^{T}\textbf{a}^{k}={a^{2}_{k}} \end{array} $$

(35)

we substitute (33), (34) and (35) into (32), and get

$$\begin{array}{@{}rcl@{}} R^{2}({\textbf{x}}_{sv})-R^{2}({\textbf{x}}^{k}_{sv})=x^{2}_{(sv,k)}-2x_{(sv,k)}a_{k}+{a^{2}_{k}} \end{array} $$

(36)

Then, substituting (36) into (31), the radius ranking score can be rewritten as:

$$\begin{array}{@{}rcl@{}} JR_{k}=\frac{1}{|SV|}\sum\limits_{{\textbf{x}}_{sv}\in SV} \left( x^{2}_{(sv,k)}-2x_{(sv,k)}a_{k}+{a^{2}_{k}}\right) \end{array} $$

(37)

This completes the proof of Theorem 1. □

1.2 B. Proof of Theorem 2

Proof

Assume that a linear SVDD model has been trained. Then we can get the coefficients α _i, i = 1,⋯ ,n of the hypersphere, and the set of support vectors S V = {α _i|α _i > 0,i = 1,⋯ ,n}.Substituting J _d(9) and ${J_{d}^{k}}$(10) into the dual-objective ranking score J D _k,we have

$$\begin{array}{@{}rcl@{}} JD_{k}&=& J_{d}-{J_{d}^{k}} \\ &=& \sum\limits_{i=1}^{n}\alpha_{i}{\textbf{x}}_{i}^{T}{\textbf{x}}_{i} -\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\alpha_{i}\alpha_{j}{\textbf{x}}_{i}^{T}{\textbf{x}}_{j}- \\ &&\sum\limits_{i=1}^{n}\alpha_{i}({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{i} -\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\alpha_{i}\alpha_{j}({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{j} \end{array} $$

(38)

Since

$$\begin{array}{@{}rcl@{}} {\textbf{x}}_{i}^{T}{\textbf{x}}_{i}-({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{i}=x^{2}_{(i,k)} \end{array} $$

(39)

and

$$\begin{array}{@{}rcl@{}} {\textbf{x}}_{i}^{T}{\textbf{x}}_{j}-({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{j}=x_{(i,k)}x_{(j,k)} \end{array} $$

(40)

we substitute (39) and (40) into (38), and get

$$\begin{array}{@{}rcl@{}} JD_{k}=\sum\limits_{i=1}^{n}\alpha_{i} x^{2}_{(i,k)} -\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\alpha_{i}\alpha_{j} x_{(i,k)}x_{(j,k)} \end{array} $$

(41)

Since only support vectors contribute to computing (41), we rewrite (41) as follows:

$$\begin{array}{@{}rcl@{}} JD_{k}&=&\sum\limits_{{\textbf{x}}_{sv}\in SV} \alpha_{sv} x^{2}_{(sv,k)}- \\ &&\sum\limits_{{\textbf{x}}_{sv}\in SV} \sum\limits_{{\textbf{x}}_{sv^{\prime}}\in SV}\alpha_{sv}\alpha_{sv^{\prime}} x_{(sv,k)}x_{(sv^{\prime},k)} \end{array} $$

(42)

This completes the proof of Theorem 2. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Lu, X. New fast feature selection methods based on multiple support vector data description. Appl Intell 48, 1776–1790 (2018). https://doi.org/10.1007/s10489-017-1054-5

Download citation

Published: 07 September 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10489-017-1054-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New fast feature selection methods based on multiple support vector data description

Abstract

Access this article

Similar content being viewed by others

Constrained class-wise feature selection (CCFS)

Feature subset selection combining maximal information entropy and maximal information coefficient

Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 A. Proof of Theorem 1

Proof

1.2 B. Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New fast feature selection methods based on multiple support vector data description

Abstract

Access this article

Similar content being viewed by others

Constrained class-wise feature selection (CCFS)

Feature subset selection combining maximal information entropy and maximal information coefficient

Mrmr+ and Cfs+ feature selection algorithms for high-dimensional data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 A. Proof of Theorem 1

Proof

1.2 B. Proof of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation