Skip to main content
Log in

Efficient approaches for 2- 0 regularization and applications to feature selection in SVM

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

For solving a class of 2- 0- regularized problems we convexify the nonconvex 2- 0 term with the help of its biconjugate function. The resulting convex program is explicitly given which possesses a very simple structure and can be handled by convex optimization tools and standard softwares. Furthermore, to exploit simultaneously the advantage of convex and nonconvex approximation approaches, we propose a two phases algorithm in which the convex relaxation is used for the first phase and in the second phase an efficient DCA (Difference of Convex functions Algorithm) based algorithm is performed from the solution given by Phase 1. Applications in the context of feature selection in support vector machine learning are presented with experiments on several synthetic and real-world datasets. Comparative numerical results with standard algorithms show the efficiency the potential of the proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Beck A, Teboulle M (2009) A fast iterative shrinkage thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202

    Article  MathSciNet  MATH  Google Scholar 

  2. Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable. sets Opt Meth Soft 1:23–34

    Article  Google Scholar 

  3. Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In ICML 1998:82–90

  4. Candes E, Wakin M, Boyd S (2008) Enhancing sparsity by reweighted l1 minimization. J Four Anal Appli

  5. Chen X, Lin Q, Kim S, Carbonel JC, Xing EP (2012) Smoothing proximal gradient method for general structured sparse regression. Ann Appl Stat 6(2):719–752

    Article  MathSciNet  MATH  Google Scholar 

  6. Chen X, Xu FM, Ye Y (2010) Lower bound theory of nonzero entries in solutions of l2-lp minimization. SIAM J Sci Comp 32(5):2832–2852

    Article  MathSciNet  MATH  Google Scholar 

  7. Collober R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23th International Conference on Machine Learning (ICML 2006). Pittsburgh, PA

  8. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  9. Dempster AP, Laird NM (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc B 39:1–38

    MathSciNet  MATH  Google Scholar 

  10. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Stat Ass 96(456):1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  11. Fu WJ (1998) Penalized regression: the bridge versus the lasso. J Comp Graph Stat 7:397–416

    Google Scholar 

  12. Gasso G, Rakotomamonjy A, Canu S (2009) Recovering sparse signals with a certain family of nonconvex penalties and dc programming. IEEE Trans Sign Proc 57:4686–4698

    Article  MathSciNet  Google Scholar 

  13. Golub TR., Slonim DK., Tamayo P, Huard C, Gaasenbeek M, Mesirov JP., Coller H, Loh ML., Downing JR., Caligiuri MA., Bloomfield CD., Lander ES. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Sci 286:531–537

    Article  Google Scholar 

  14. Gordon GJ, Jensen RV, Hsiao L, Gullans SR, Blumenstock FE, Ramaswamy R, Richard WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967

    Google Scholar 

  15. Guan A, Gray W (2013) Sparse high-dimensional fractional-norm support vector machine via dc programming. Comput Stat Data Anal 67:136–148

    Article  MathSciNet  Google Scholar 

  16. Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature Extractions and Applications

  17. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. springer, Heidelberg 2th edition

  18. Hoerl AE, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55– 67

    Article  MATH  Google Scholar 

  19. Le HM, Le Thi HA, Nguyen MC (2015) Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153:62–76

    Article  Google Scholar 

  20. Le Thi HA DC programming and DCA. http://www.lita.univ-lorraine.fr/~lethi/index.php/dca.html

  21. Le Thi HA, Le HM, Pham Dinh T Feature selection in machine learning: an exact penalty approach using a difference of convex functions algorithm. Mach learn. doi:10.1007/s10994-014-5455-y. Online July 2014

  22. Le Thi HA, Nguyen VV, Ouchani S (2008) Gene selection for cancer classification using DCA. Adv Dat Min Appl LNCS 5139:62–72

  23. Le Thi HA, Pham Dinh T (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world non convex optimization problems. Ann Oper Res 133:23–46

    Article  MathSciNet  MATH  Google Scholar 

  24. Le Thi HA, Pham Dinh T, Le HM., Vo Xuan T (2015) DC Approximation approaches for sparse optimization. EJOR 44(1):26–46

    Article  MathSciNet  Google Scholar 

  25. Le Thi HA, Vo Xuan T, Pham Dinh T (2014) Feature selection for linear svms under uncertain data: robust optimization based on difference of convex functions algorithms. Neural Netw 59:36–50

    Article  MATH  Google Scholar 

  26. Le Thi H, Nguyen B, Le HM (2013) Sparse signal recovery by difference of convex functions algorithms. In in Intelligent Information and Database Systems. Lect Notes Comput Sci 7803:387– 397

    Article  Google Scholar 

  27. Le Thi HA, Le HM, Pham Dinh T (2007) Fuzzy clustering based on nonconvex optimisation approaches using difference of convex (DC) functions algorithms. Journal of Advances in Data Analysis and Classification 2:1–20

    MathSciNet  MATH  Google Scholar 

  28. Le Thi H, Le HM, Pham Dinh T (2014) New and efficient dca based algorithms for minimum sum-of-squares clustering. Pattern Recogn 47(1):388–401

    Article  MATH  Google Scholar 

  29. Le Thi H, Le HM, Pham Dinh T, Huynh VN (2013) Block clustering based on DC programming and DCA. Neural Comput 25(10):2776–2807

    Article  MathSciNet  Google Scholar 

  30. Le Thi HA, Le Hoai M, Nguyen VV (2008) A DC programming approach for feature selection in support vector machines learning. J Adv Dat Anal Class 2:259–278

    Article  MathSciNet  MATH  Google Scholar 

  31. Neumann J, Schnörr C, Steidl G (2005) Combined svm-based feature selection and classification. Mach Learn 61:129– 150

    Article  MATH  Google Scholar 

  32. Ong CS, Le Thi HA Learning with sparsity by difference of convex functions algorithm. J Optimization Methods Software. doi:10.1080/10556788.2011.652630:14. Press 27 February 2012

  33. Peleg D, Meir R (2008) A bilinear formulation for vector sparsity optimization. Signal Processing 8 (2):375–389

    Article  MATH  Google Scholar 

  34. Pham Dinh T, Le Thi HA (1997) Convex analysis approaches to dc programming: Theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):287–367

    Google Scholar 

  35. Pham Dinh T, Le Thi HA (1998) D.c. optimization algorithms for solving the trust region subproblem. SIAM J Optim:476–505

  36. Rao BD, Engan K, Cotter SF, Palmer J, Kreutz-Delgado K (2003) Subset selection in noise based on diversity measure minimization. IEEE Trans Signal Process 51(3):760–770

    Article  Google Scholar 

  37. Rao BD, Kreutz-Delgado K (1999) An affine scaling methodology for best basis selection. IEEE Trans Signal Process 47:87–200

    Article  MathSciNet  MATH  Google Scholar 

  38. Rockafellar RT (1970) Convex analysis. Princeton University Press

  39. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209

    Article  Google Scholar 

  40. Thiao M, Pham Dinh T, Le Thi HA (2008) Dc programming approach for a class of nonconvex programs involving l0 norm. In: Modelling Computation and Optimization in Information Systems and Management Sciences, Communications in Computer and Information Science CCIS, Springer, vol 14, pp 358–367

  41. Tibshirani R (1996) Regression shrinkage selection via the lasso. J Roy Stat Regression Soc 46:431–439

    MathSciNet  MATH  Google Scholar 

  42. Tseng P, Yun S (2009) A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming 117(1):387–423

    Article  MathSciNet  MATH  Google Scholar 

  43. Weston J, Elisseeff A, Scholkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461

    MathSciNet  MATH  Google Scholar 

  44. Yuille AL, Rangarajan A (2002) The Convex Concave Procedure (Cccp) Advances in Neural Information Processing System, vol 14. MIT Press, Cambrige MA

  45. Zhang T (2009) Some sharp performance bounds for least squares regression with l1 regularization. Ann Statist 37:2109–2144

    Article  MathSciNet  MATH  Google Scholar 

  46. Zou H (2006) The adaptive lasso and its oracle properties. J Amer Stat Ass 101:1418–1429

    Article  MathSciNet  MATH  Google Scholar 

  47. Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Statist 36 (4):1509– 1533

    Article  MathSciNet  MATH  Google Scholar 

  48. Le Thi H A, Nguyen MC (2014) Self-organizing maps by difference of convex functions optimization. Data Min. Knowl. Disc. 28(5-6):1336–1365

    Article  MathSciNet  Google Scholar 

  49. Le Thi H A, Nguyen M C , Pham Dinh T (2014) ADCprogramming approach for finding Communities in networks. Neural Comput. 26(12):2827–2854

    Article  MathSciNet  Google Scholar 

  50. Liu Y, Shen X, Doss H (2005) Multicategory ψ-learning and support vector machine: computational tools. J. Comput. Graph. Stat. 14:219–236

    Article  MathSciNet  Google Scholar 

  51. Liu Y, Shen X (2006) Multicategory ψ-Learning. J. Am. Stat. Assoc. 101:500–509

    Article  MathSciNet  Google Scholar 

  52. Weber S, Nagy A, Schüle T, Schnörr C, Kuba A (2006) A benchmark evaluation of large-scale optimization approaches to binary tomography. Proceedings of the Conference on Discrete Geometry on Computer Imagery (DGCI 2006), vol 4245

Download references

Acknowledgments

This research is funded by Foundation for Science and Technology Development of Ton Duc Thang University (FOSTECT), website: http://fostect.tdt.edu.vn, under Grant FOSTECT.2015.BR.15.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoai An Le Thi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le Thi, H.A., Pham Dinh, T. & Thiao, M. Efficient approaches for 2- 0 regularization and applications to feature selection in SVM. Appl Intell 45, 549–565 (2016). https://doi.org/10.1007/s10489-016-0778-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0778-y

Keywords

Navigation