Skip to main content
Log in

Sparse Support Vector Machine with L p Penalty for Feature Selection

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

We study the strategies in feature selection with sparse support vector machine (SVM). Recently, the socalled L p -SVM (0 < p < 1) has attracted much attention because it can encourage better sparsity than the widely used L 1-SVM. However, L p -SVM is a non-convex and non-Lipschitz optimization problem. Solving this problem numerically is challenging. In this paper, we reformulate the L p -SVM into an optimization model with linear objective function and smooth constraints (LOSC-SVM) so that it can be solved by numerical methods for smooth constrained optimization. Our numerical experiments on artificial datasets show that LOSC-SVM (0 < p < 1) can improve the classification performance in both feature selection and classification by choosing a suitable parameter p. We also apply it to some real-life datasets and experimental results show that it is superior to L 1-SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Vapnik V N. The Nature of Statistical Learning Theory (2nd edition). Springer, 2000.

  2. Guyon I, Gunn S, Nikravesh M, Zadeh L A. Feature Extraction: Foundations and Applications (1st edition). Springer, 2006.

  3. Saeys Y, Inza I, Larranagal P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19): 2507-2517.

  4. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1/2/3): 389-422.

  5. Rakotomamonjy A. Variable selection using SVM based criteria. The Journal of Machine Learning Research, 2003, 3: 1357-1370.

  6. Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V. Feature selection for SVMs. In Advances in Neural Information Processing Systems 13, Leen T K, Diet-terich T G, Tresp V (eds.), Massachusetts Institute of Technology, 2001, pp.668-674.

  7. Peleg D, Meir R. A feature selection algorithm based on the global minimization of a generalization error bound. In Advances in Neural Information Processing Systems 17, Saul L K, Weiss Y, Bottou L (eds.), Massachusetts Institute of Technology, 2005, pp.1065-1072.

  8. Bradley P S, Mangasarian O L. Feature selection via concave minimization and support vector machines. In Proc. the 5th International Conference on Machine Learning, July 1998, pp.82-90.

  9. Weston J, Elisseeff A, Schölkopf B, Tipping M. Use of the zero norm with linear models and kernel methods. The Journal of Machine Learning Research, 2003, 3: 1439-1461.

  10. Amaldi E, Kann V. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 1998, 209(1/2): 237-260.

  11. Chan A B, Vasconcelos N, Lanckriet G R G. Direct convex relaxations of sparse SVM. In Proc. the 24th International Conference on Machine Learning, June 2007, pp.145-153.

  12. Fung G M, Mangasarian O L. A feature selection newton method for support vector machine classification. Computational Optimization and Applications, 2004, 28(2): 185-202.

  13. Bi J B, Bennett K, Embrechts M, Breneman C, Song M H. Dimensionality reduction via sparse support vector machines. The Journal of Machine Learning Research, 2003, 3: 1229-1243.

  14. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 1996, 58(1): 267-288.

  15. Neumann J, Schnörr C, Steidl G. Combined SVM-based feature selection and classification. Machine Learning, 2005, 61(1/2/3): 129-150.

  16. Chartrand R. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 2007, 14(10): 707-710.

  17. Chartrand R. Nonconvex regularization for shape preservation. In Proc. the IEEE International Conference on Image Processing, September 16-October 19, 2007, pp.293-296.

  18. Xu Z B, Zhang H, Wang Y, Chang X Y, Liang Y. L 1/2 regularization. Science China Information Sciences, 2010, 53(6): 1159-1169.

  19. Liu J L, Li J P, Xu W X, Shi Y. A weighted L q adaptive least squares support vector machine classifiers—Robust and sparse approximation. Expert Systems with Applications, 2011, 38(3): 2253-2259.

  20. Chen W J, Tian Y J. Lp-norm proximal support vector machine and its applications. Procedia Computer Science, 2010, 1(1): 2417-2423.

  21. Rakotomamonjy A, Flamary R, Gasso G, Canu S. l p l q penalty for sparse linear and sparse multiple kernel multitask learning. IEEE Transactions on Neural Networks, 2011, 22(8): 1307-1320.

  22. Liu Y F, Zhang H H, Park C, Ahn J. Support vector machines with adaptive L q penalty. Computational Statistics and Data Analysis, 2007, 51(12): 6380-6394.

  23. Liu Z Q, Lin S L, Tan M. Sparse support vector machines with L p penalty for biomarker identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010, 7(1): 100-107.

  24. Tan J Y, Zhang Z Q, Zhen L, Zhang C H, Deng N Y. Adaptive feature selection via a new version of support vector machine. Neural Computing and Applications, 2013, 23(3/4): 937-945.

  25. Tian Y J, Yu J, Chen W J. lp -norm support vector machine with CCCP. In Proc. the 7th International Conference on Fuzzy Systems and Knowledge Discovery, August 2010, pp.1560-1564.

  26. Liu J W, Liu Y. Non-integer norm regularization SVM via Legendre-Fenchel duality. Neurocomputing, 2014, 144: 537-545.

  27. Chen X J, Xu F M, Ye Y Y. Lower bound theory of nonzero entries in solutions of l 2-l p minimization. SIAM J. Sci. Comput., 2010, 32(5): 2832-2852.

  28. Zhang C H, Shao Y H, Tan J Y, Deng N Y. Mixed-norm linear support vector machine. Neural Computing and Applications, 2013, 23(7): 2159-2166.

  29. Li D H, Wu L, Sun Z, Zhang X J. A constrained optimization reformulation and a feasible descent direction method for L 1/2 regularization. Computational Optimization and Applications, 2014, 59(1/2): 263-284.

  30. Newman D J, Hettich S, Blake C L, Merz C J. UCI repository of machine learning databases. Technical Report 9702, Department of Information and Computer Science, University of California, Irvine, 1998. http://archive.ics.uci.edu/ml/, Nov. 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Zeng.

Additional information

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61502159, 61379057, 11101081, and 11271069, and the Research Foundation of Central South University of China under Grant No. 2014JSJJ019.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, L., Zeng, F., Li, DH. et al. Sparse Support Vector Machine with L p Penalty for Feature Selection. J. Comput. Sci. Technol. 32, 68–77 (2017). https://doi.org/10.1007/s11390-017-1706-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-017-1706-2

Keywords

Navigation