Abstract
Feature selection can sort out useful features to obtain good performance when dealing with high-dimensional data. Feature selection methods based on support vector data description (SVDD) have been proposed for one-class classification problems: SVDD-radius-recursive feature elimination (SVDD-RRFE) and SVDD-dual-objective-recursive feature elimination (SVDD-DRFE). However, both SVDD-RRFE and SVDD-DRFE use only one-class samples even given a multi-class classification task, and suffer from high computational complexity. To remedy it, this paper extends both SVDD-RRFE and SVDD-DRFE to binary and multi-class classification problems using multiple SVDD models, and proposes fast feature ranking schemes for them in the case of the linear kernel. Experimental results on toy, UCI and microarray datasets show the efficiency and the feasibility of the proposed methods.
Similar content being viewed by others
References
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M (2001) Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc Nat Acad Sci 98(24):13,790–13,795
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97 (1-2):245–271
Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inf 53:381–389
Chen H, Yang B, Liu J, Liu D (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022
Daelemans W, Goethals B, Morik K (eds) (2008) Machine learning and knowledge discovery in databases, european conference, ECML/PKDD 2008. In: Proceedings, part II, lecture notes in computer science, vol 5212. Springer, Antwerp
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1):131–156
Demṡar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dunn OJ (1961) Multiple comparisons among means. J Amer Stat Assoc 56(293):52–64
Frank A, Asuncion A (2010) UCI machine learning repository from http://archive.ics.uci.edu/ml.html
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Amer Stat Assoc 32(200):675–701
Geller SC, Gregg JP, Hagerman P, Rocke DM (2003) Transformation and normalization of oligonucleotide microarray data. Bioinformatics 19(14):1817–1823
Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43 (1):5–13
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5436):531–537
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Machine Learn 46(1-3):389–422
Hermes L, Buhmann JM (2000) Feature selection for support vector machines. In: 15th International Conference on Pattern Recognition, ICPR’00, Spain, pp 2712–2715
Huang C, Dun J (2008) A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391
Jeong Y, Kang I, Jeong MK, Kong D (2012) A new feature selection method for one-class classification problems. IEEE Trans Syst Man, Cybern Part C 42(6):1500–1509
Khan J, Wei JS, Ringné M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Kittler J (1986) Feature selection and extraction. In: Handbook of Pattern Recognition and Image Processing. Orlando, FL: Academic Press, pp 59–83
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324
Lashkia GV, Anthony L (2004) Relevant, irredundant feature selection and noisy example elimination. IEEE Trans Syst Man, Cybern Part B 34(2):888–897
Leardi R, Nørgaard L (2004) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemometr 18(11):486–497
Lee D, Lee J (2007) Domain described support vector classifier for multi-classification problems. Pattern Recogn 40(1):41–51
Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128
Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442
Shao L, Liu L, Li X (2014) Feature learning for image classification via multiobjective genetic programming. IEEE Trans Neural Netw Learn Syst 25(7):1359–1371
Shieh M, Yang C (2008) Multiclass SVM-RFE for product form feature selection. Expert Syst Appl 35 (1-2):531–541
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Tayal A, Coleman TF, Li Y (2014) Primal explicit max margin feature selection for nonlinear support vector machines. Pattern Recogn 47(6):2153–2164
Wang J, Shan G, Zhang Q, DUAN X (2011) Research on feature selection method based on improved SVM-RFE. Microcomput Appl 32(2):70–74
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio TA, Vapnik V (2000) Feature selection for svms. In: Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, USA, pp 668–674
Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE T Cybern 43(6):1656–1671
Yang J, Ong CJ (2012) An effective feature selection method via mutual information estimation. IEEE Trans Syst Man, Cybern Part B 42(6):1550–1559
Yang W, Gao Y, Shi Y, Cao L (2015) MRM-Lasso: A sparse multiview feature selection method via low-rank analysis. IEEE Trans Neural Netw Learn Syst 26(11):2801–2815
Zhou X, Tuck DP (2007) MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9):1106–1114
Zhu Z, Ong Y, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61373093, by the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20140008, and by the Soochow Scholar Project.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A. Proof of Theorem 1
Proof
Assume that a linear SVDD model has been trained. Then we can get the center a of the hypersphere, and the set of support vectors SV . Substituting J r (6) and \({J_{r}^{k}}\)(7) into the radius ranking score J R k ,we have
where \({\textbf {x}}_{sv}\in \mathbb {R}^{D}\)is a supportvector, and \({\textbf {x}}^{k}_{sv}=[x_{(sv,1)},\cdots ,x_{(sv,k-1)},x_{(sv,k+1)},\cdots ,x_{(sv,D)}]^{T} \in \mathbb {R}^{D-1}\).
According to (31), it is necessary to find the difference \(R^{2}({\textbf {x}}_{sv})-R^{2}({\textbf {x}}^{k}_{sv})\).Let \(\textbf {a}^{k}=[a_{1},\cdots ,a_{k-1},a_{k+1},\cdots ,a_{D}] \in \mathbb {R}^{D-1}\).Then, we have
Since
and
we substitute (33), (34) and (35) into (32), and get
Then, substituting (36) into (31), the radius ranking score can be rewritten as:
This completes the proof of Theorem 1. □
1.2 B. Proof of Theorem 2
Proof
Assume that a linear SVDD model has been trained. Then we can get the coefficients α i , i = 1,⋯ ,n of the hypersphere, and the set of support vectors S V = {α i |α i > 0,i = 1,⋯ ,n}.Substituting J d (9) and \({J_{d}^{k}}\)(10) into the dual-objective ranking score J D k ,we have
Since
and
we substitute (39) and (40) into (38), and get
Since only support vectors contribute to computing (41), we rewrite (41) as follows:
This completes the proof of Theorem 2. □
Rights and permissions
About this article
Cite this article
Zhang, L., Lu, X. New fast feature selection methods based on multiple support vector data description. Appl Intell 48, 1776–1790 (2018). https://doi.org/10.1007/s10489-017-1054-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1054-5