Skip to main content
Log in

New fast feature selection methods based on multiple support vector data description

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Feature selection can sort out useful features to obtain good performance when dealing with high-dimensional data. Feature selection methods based on support vector data description (SVDD) have been proposed for one-class classification problems: SVDD-radius-recursive feature elimination (SVDD-RRFE) and SVDD-dual-objective-recursive feature elimination (SVDD-DRFE). However, both SVDD-RRFE and SVDD-DRFE use only one-class samples even given a multi-class classification task, and suffer from high computational complexity. To remedy it, this paper extends both SVDD-RRFE and SVDD-DRFE to binary and multi-class classification problems using multiple SVDD models, and proposes fast feature ranking schemes for them in the case of the linear kernel. Experimental results on toy, UCI and microarray datasets show the efficiency and the feasibility of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M (2001) Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc Nat Acad Sci 98(24):13,790–13,795

    Article  Google Scholar 

  2. Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97 (1-2):245–271

    Article  MathSciNet  MATH  Google Scholar 

  3. Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inf 53:381–389

    Article  Google Scholar 

  4. Chen H, Yang B, Liu J, Liu D (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022

    Article  Google Scholar 

  5. Daelemans W, Goethals B, Morik K (eds) (2008) Machine learning and knowledge discovery in databases, european conference, ECML/PKDD 2008. In: Proceedings, part II, lecture notes in computer science, vol 5212. Springer, Antwerp

    Google Scholar 

  6. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1):131–156

    Article  Google Scholar 

  7. Demṡar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  8. Dunn OJ (1961) Multiple comparisons among means. J Amer Stat Assoc 56(293):52–64

    Article  MathSciNet  MATH  Google Scholar 

  9. Frank A, Asuncion A (2010) UCI machine learning repository from http://archive.ics.uci.edu/ml.html

  10. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Amer Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  11. Geller SC, Gregg JP, Hagerman P, Rocke DM (2003) Transformation and normalization of oligonucleotide microarray data. Bioinformatics 19(14):1817–1823

    Article  Google Scholar 

  12. Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43 (1):5–13

    Article  MATH  Google Scholar 

  13. Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5436):531–537

    Article  Google Scholar 

  14. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  15. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Machine Learn 46(1-3):389–422

    Article  MATH  Google Scholar 

  16. Hermes L, Buhmann JM (2000) Feature selection for support vector machines. In: 15th International Conference on Pattern Recognition, ICPR’00, Spain, pp 2712–2715

  17. Huang C, Dun J (2008) A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391

    Article  Google Scholar 

  18. Jeong Y, Kang I, Jeong MK, Kong D (2012) A new feature selection method for one-class classification problems. IEEE Trans Syst Man, Cybern Part C 42(6):1500–1509

    Article  Google Scholar 

  19. Khan J, Wei JS, Ringné M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679

    Article  Google Scholar 

  20. Kittler J (1986) Feature selection and extraction. In: Handbook of Pattern Recognition and Image Processing. Orlando, FL: Academic Press, pp 59–83

  21. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324

    Article  MATH  Google Scholar 

  22. Lashkia GV, Anthony L (2004) Relevant, irredundant feature selection and noisy example elimination. IEEE Trans Syst Man, Cybern Part B 34(2):888–897

    Article  Google Scholar 

  23. Leardi R, Nørgaard L (2004) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemometr 18(11):486–497

    Article  Google Scholar 

  24. Lee D, Lee J (2007) Domain described support vector classifier for multi-classification problems. Pattern Recogn 40(1):41–51

    Article  MATH  Google Scholar 

  25. Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128

    Article  Google Scholar 

  26. Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442

    Article  Google Scholar 

  27. Shao L, Liu L, Li X (2014) Feature learning for image classification via multiobjective genetic programming. IEEE Trans Neural Netw Learn Syst 25(7):1359–1371

    Article  Google Scholar 

  28. Shieh M, Yang C (2008) Multiclass SVM-RFE for product form feature selection. Expert Syst Appl 35 (1-2):531–541

    Article  Google Scholar 

  29. Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66

    Article  MATH  Google Scholar 

  30. Tayal A, Coleman TF, Li Y (2014) Primal explicit max margin feature selection for nonlinear support vector machines. Pattern Recogn 47(6):2153–2164

    Article  MATH  Google Scholar 

  31. Wang J, Shan G, Zhang Q, DUAN X (2011) Research on feature selection method based on improved SVM-RFE. Microcomput Appl 32(2):70–74

    Google Scholar 

  32. Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio TA, Vapnik V (2000) Feature selection for svms. In: Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, USA, pp 668–674

  33. Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE T Cybern 43(6):1656–1671

    Article  Google Scholar 

  34. Yang J, Ong CJ (2012) An effective feature selection method via mutual information estimation. IEEE Trans Syst Man, Cybern Part B 42(6):1550–1559

    Article  Google Scholar 

  35. Yang W, Gao Y, Shi Y, Cao L (2015) MRM-Lasso: A sparse multiview feature selection method via low-rank analysis. IEEE Trans Neural Netw Learn Syst 26(11):2801–2815

    Article  MathSciNet  Google Scholar 

  36. Zhou X, Tuck DP (2007) MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9):1106–1114

    Article  Google Scholar 

  37. Zhu Z, Ong Y, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61373093, by the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20140008, and by the Soochow Scholar Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang.

Appendix

Appendix

1.1 A. Proof of Theorem 1

Proof

Assume that a linear SVDD model has been trained. Then we can get the center a of the hypersphere, and the set of support vectors SV . Substituting J r (6) and \({J_{r}^{k}}\)(7) into the radius ranking score J R k ,we have

$$\begin{array}{@{}rcl@{}} JR_{k}&=& J_{r}-{J_{r}^{k}} \\ &=& \sum\limits_{{{\textbf{x}}_{sv}} \in {{SV}}} {\frac{{{R^{2}}\left( {{{\textbf{x}}_{sv}}} \right)}}{{\left| {SV} \right|}}}- \sum\limits_{{{\textbf{x}}_{sv}} \in {{SV}}} {\frac{{{R^{2}}\left( {{{\textbf{x}}^{k}_{sv}}} \right)}}{{\left| {SV} \right|}}} \end{array} $$
(31)

where \({\textbf {x}}_{sv}\in \mathbb {R}^{D}\)is a supportvector, and \({\textbf {x}}^{k}_{sv}=[x_{(sv,1)},\cdots ,x_{(sv,k-1)},x_{(sv,k+1)},\cdots ,x_{(sv,D)}]^{T} \in \mathbb {R}^{D-1}\).

According to (31), it is necessary to find the difference \(R^{2}({\textbf {x}}_{sv})-R^{2}({\textbf {x}}^{k}_{sv})\).Let \(\textbf {a}^{k}=[a_{1},\cdots ,a_{k-1},a_{k+1},\cdots ,a_{D}] \in \mathbb {R}^{D-1}\).Then, we have

$$\begin{array}{@{}rcl@{}} &&R^{2}({\textbf{x}}_{sv})-R^{2}({\textbf{x}}^{k}_{sv})\\ &=&\|{\textbf{x}}_{sv}-\textbf{a}\|^{2}-\|{\textbf{x}}^{k}_{sv}-\textbf{a}^{k}\|^{2} \\ &=& {\textbf{x}}^{T}_{sv}{\textbf{x}}_{sv}-2{\textbf{x}}^{T}_{sv}{\textbf{a}} +\textbf{a}^{T}\textbf{a}-\\ &&\left( ({\textbf{x}}^{k}_{sv})^{T}{\textbf{x}}^{k}_{sv}-2({\textbf{x}}^{k}_{sv})^{T}{\textbf{a}^{k}} +(\textbf{a}^{k})^{T}\textbf{a}^{k} \right) \end{array} $$
(32)

Since

$$\begin{array}{@{}rcl@{}} {\textbf{x}}^{T}_{sv}{\textbf{x}}_{sv}-({\textbf{x}}^{k}_{sv})^{T}{\textbf{x}}^{k}_{sv}=x^{2}_{(sv,k)} \end{array} $$
(33)
$$\begin{array}{@{}rcl@{}} {\textbf{x}}^{T}_{sv}{\textbf{a}}-({\textbf{x}}^{k}_{sv})^{T}{\textbf{a}^{k}}=x_{(sv,k)}a_{k} \end{array} $$
(34)

and

$$\begin{array}{@{}rcl@{}} \textbf{a}^{T}\textbf{a}-(\textbf{a}^{k})^{T}\textbf{a}^{k}={a^{2}_{k}} \end{array} $$
(35)

we substitute (33), (34) and (35) into (32), and get

$$\begin{array}{@{}rcl@{}} R^{2}({\textbf{x}}_{sv})-R^{2}({\textbf{x}}^{k}_{sv})=x^{2}_{(sv,k)}-2x_{(sv,k)}a_{k}+{a^{2}_{k}} \end{array} $$
(36)

Then, substituting (36) into (31), the radius ranking score can be rewritten as:

$$\begin{array}{@{}rcl@{}} JR_{k}=\frac{1}{|SV|}\sum\limits_{{\textbf{x}}_{sv}\in SV} \left( x^{2}_{(sv,k)}-2x_{(sv,k)}a_{k}+{a^{2}_{k}}\right) \end{array} $$
(37)

This completes the proof of Theorem 1. □

1.2 B. Proof of Theorem 2

Proof

Assume that a linear SVDD model has been trained. Then we can get the coefficients α i , i = 1,⋯ ,n of the hypersphere, and the set of support vectors S V = {α i |α i > 0,i = 1,⋯ ,n}.Substituting J d (9) and \({J_{d}^{k}}\)(10) into the dual-objective ranking score J D k ,we have

$$\begin{array}{@{}rcl@{}} JD_{k}&=& J_{d}-{J_{d}^{k}} \\ &=& \sum\limits_{i=1}^{n}\alpha_{i}{\textbf{x}}_{i}^{T}{\textbf{x}}_{i} -\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\alpha_{i}\alpha_{j}{\textbf{x}}_{i}^{T}{\textbf{x}}_{j}- \\ &&\sum\limits_{i=1}^{n}\alpha_{i}({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{i} -\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\alpha_{i}\alpha_{j}({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{j} \end{array} $$
(38)

Since

$$\begin{array}{@{}rcl@{}} {\textbf{x}}_{i}^{T}{\textbf{x}}_{i}-({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{i}=x^{2}_{(i,k)} \end{array} $$
(39)

and

$$\begin{array}{@{}rcl@{}} {\textbf{x}}_{i}^{T}{\textbf{x}}_{j}-({\textbf{x}}^{k}_{i})^{T}{\textbf{x}}^{k}_{j}=x_{(i,k)}x_{(j,k)} \end{array} $$
(40)

we substitute (39) and (40) into (38), and get

$$\begin{array}{@{}rcl@{}} JD_{k}=\sum\limits_{i=1}^{n}\alpha_{i} x^{2}_{(i,k)} -\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}\alpha_{i}\alpha_{j} x_{(i,k)}x_{(j,k)} \end{array} $$
(41)

Since only support vectors contribute to computing (41), we rewrite (41) as follows:

$$\begin{array}{@{}rcl@{}} JD_{k}&=&\sum\limits_{{\textbf{x}}_{sv}\in SV} \alpha_{sv} x^{2}_{(sv,k)}- \\ &&\sum\limits_{{\textbf{x}}_{sv}\in SV} \sum\limits_{{\textbf{x}}_{sv^{\prime}}\in SV}\alpha_{sv}\alpha_{sv^{\prime}} x_{(sv,k)}x_{(sv^{\prime},k)} \end{array} $$
(42)

This completes the proof of Theorem 2. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Lu, X. New fast feature selection methods based on multiple support vector data description. Appl Intell 48, 1776–1790 (2018). https://doi.org/10.1007/s10489-017-1054-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-1054-5

Keywords

Navigation