Abstract
For solving a class of ℓ 2- ℓ 0- regularized problems we convexify the nonconvex ℓ 2- ℓ 0 term with the help of its biconjugate function. The resulting convex program is explicitly given which possesses a very simple structure and can be handled by convex optimization tools and standard softwares. Furthermore, to exploit simultaneously the advantage of convex and nonconvex approximation approaches, we propose a two phases algorithm in which the convex relaxation is used for the first phase and in the second phase an efficient DCA (Difference of Convex functions Algorithm) based algorithm is performed from the solution given by Phase 1. Applications in the context of feature selection in support vector machine learning are presented with experiments on several synthetic and real-world datasets. Comparative numerical results with standard algorithms show the efficiency the potential of the proposed approaches.
Similar content being viewed by others
References
Beck A, Teboulle M (2009) A fast iterative shrinkage thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable. sets Opt Meth Soft 1:23–34
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In ICML 1998:82–90
Candes E, Wakin M, Boyd S (2008) Enhancing sparsity by reweighted l1 minimization. J Four Anal Appli
Chen X, Lin Q, Kim S, Carbonel JC, Xing EP (2012) Smoothing proximal gradient method for general structured sparse regression. Ann Appl Stat 6(2):719–752
Chen X, Xu FM, Ye Y (2010) Lower bound theory of nonzero entries in solutions of l2-lp minimization. SIAM J Sci Comp 32(5):2832–2852
Collober R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23th International Conference on Machine Learning (ICML 2006). Pittsburgh, PA
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
Dempster AP, Laird NM (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc B 39:1–38
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Stat Ass 96(456):1348–1360
Fu WJ (1998) Penalized regression: the bridge versus the lasso. J Comp Graph Stat 7:397–416
Gasso G, Rakotomamonjy A, Canu S (2009) Recovering sparse signals with a certain family of nonconvex penalties and dc programming. IEEE Trans Sign Proc 57:4686–4698
Golub TR., Slonim DK., Tamayo P, Huard C, Gaasenbeek M, Mesirov JP., Coller H, Loh ML., Downing JR., Caligiuri MA., Bloomfield CD., Lander ES. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Sci 286:531–537
Gordon GJ, Jensen RV, Hsiao L, Gullans SR, Blumenstock FE, Ramaswamy R, Richard WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
Guan A, Gray W (2013) Sparse high-dimensional fractional-norm support vector machine via dc programming. Comput Stat Data Anal 67:136–148
Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature Extractions and Applications
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. springer, Heidelberg 2th edition
Hoerl AE, Kennard R (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55– 67
Le HM, Le Thi HA, Nguyen MC (2015) Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing 153:62–76
Le Thi HA DC programming and DCA. http://www.lita.univ-lorraine.fr/~lethi/index.php/dca.html
Le Thi HA, Le HM, Pham Dinh T Feature selection in machine learning: an exact penalty approach using a difference of convex functions algorithm. Mach learn. doi:10.1007/s10994-014-5455-y. Online July 2014
Le Thi HA, Nguyen VV, Ouchani S (2008) Gene selection for cancer classification using DCA. Adv Dat Min Appl LNCS 5139:62–72
Le Thi HA, Pham Dinh T (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world non convex optimization problems. Ann Oper Res 133:23–46
Le Thi HA, Pham Dinh T, Le HM., Vo Xuan T (2015) DC Approximation approaches for sparse optimization. EJOR 44(1):26–46
Le Thi HA, Vo Xuan T, Pham Dinh T (2014) Feature selection for linear svms under uncertain data: robust optimization based on difference of convex functions algorithms. Neural Netw 59:36–50
Le Thi H, Nguyen B, Le HM (2013) Sparse signal recovery by difference of convex functions algorithms. In in Intelligent Information and Database Systems. Lect Notes Comput Sci 7803:387– 397
Le Thi HA, Le HM, Pham Dinh T (2007) Fuzzy clustering based on nonconvex optimisation approaches using difference of convex (DC) functions algorithms. Journal of Advances in Data Analysis and Classification 2:1–20
Le Thi H, Le HM, Pham Dinh T (2014) New and efficient dca based algorithms for minimum sum-of-squares clustering. Pattern Recogn 47(1):388–401
Le Thi H, Le HM, Pham Dinh T, Huynh VN (2013) Block clustering based on DC programming and DCA. Neural Comput 25(10):2776–2807
Le Thi HA, Le Hoai M, Nguyen VV (2008) A DC programming approach for feature selection in support vector machines learning. J Adv Dat Anal Class 2:259–278
Neumann J, Schnörr C, Steidl G (2005) Combined svm-based feature selection and classification. Mach Learn 61:129– 150
Ong CS, Le Thi HA Learning with sparsity by difference of convex functions algorithm. J Optimization Methods Software. doi:10.1080/10556788.2011.652630:14. Press 27 February 2012
Peleg D, Meir R (2008) A bilinear formulation for vector sparsity optimization. Signal Processing 8 (2):375–389
Pham Dinh T, Le Thi HA (1997) Convex analysis approaches to dc programming: Theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):287–367
Pham Dinh T, Le Thi HA (1998) D.c. optimization algorithms for solving the trust region subproblem. SIAM J Optim:476–505
Rao BD, Engan K, Cotter SF, Palmer J, Kreutz-Delgado K (2003) Subset selection in noise based on diversity measure minimization. IEEE Trans Signal Process 51(3):760–770
Rao BD, Kreutz-Delgado K (1999) An affine scaling methodology for best basis selection. IEEE Trans Signal Process 47:87–200
Rockafellar RT (1970) Convex analysis. Princeton University Press
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Thiao M, Pham Dinh T, Le Thi HA (2008) Dc programming approach for a class of nonconvex programs involving l0 norm. In: Modelling Computation and Optimization in Information Systems and Management Sciences, Communications in Computer and Information Science CCIS, Springer, vol 14, pp 358–367
Tibshirani R (1996) Regression shrinkage selection via the lasso. J Roy Stat Regression Soc 46:431–439
Tseng P, Yun S (2009) A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming 117(1):387–423
Weston J, Elisseeff A, Scholkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
Yuille AL, Rangarajan A (2002) The Convex Concave Procedure (Cccp) Advances in Neural Information Processing System, vol 14. MIT Press, Cambrige MA
Zhang T (2009) Some sharp performance bounds for least squares regression with l1 regularization. Ann Statist 37:2109–2144
Zou H (2006) The adaptive lasso and its oracle properties. J Amer Stat Ass 101:1418–1429
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Statist 36 (4):1509– 1533
Le Thi H A, Nguyen MC (2014) Self-organizing maps by difference of convex functions optimization. Data Min. Knowl. Disc. 28(5-6):1336–1365
Le Thi H A, Nguyen M C , Pham Dinh T (2014) ADCprogramming approach for finding Communities in networks. Neural Comput. 26(12):2827–2854
Liu Y, Shen X, Doss H (2005) Multicategory ψ-learning and support vector machine: computational tools. J. Comput. Graph. Stat. 14:219–236
Liu Y, Shen X (2006) Multicategory ψ-Learning. J. Am. Stat. Assoc. 101:500–509
Weber S, Nagy A, Schüle T, Schnörr C, Kuba A (2006) A benchmark evaluation of large-scale optimization approaches to binary tomography. Proceedings of the Conference on Discrete Geometry on Computer Imagery (DGCI 2006), vol 4245
Acknowledgments
This research is funded by Foundation for Science and Technology Development of Ton Duc Thang University (FOSTECT), website: http://fostect.tdt.edu.vn, under Grant FOSTECT.2015.BR.15.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Le Thi, H.A., Pham Dinh, T. & Thiao, M. Efficient approaches for ℓ 2-ℓ 0 regularization and applications to feature selection in SVM. Appl Intell 45, 549–565 (2016). https://doi.org/10.1007/s10489-016-0778-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0778-y