Abstract
Feature selection for logistic regression (LR) is still a challenging subject. In this paper, we present a new feature selection method for logistic regression based on a combination of the zero-norm and l 2-norm regularization. However, discontinuity of the zero-norm makes it difficult to find the optimal solution. We apply a proper nonconvex approximation of the zero-norm to derive a robust difference of convex functions (DC) program. Moreover, DC optimization algorithm (DCA) is used to solve the problem effectively and the corresponding DCA converges linearly. Compared with traditional methods, numerical experiments on benchmark datasets show that the proposed method reduces the number of input features while maintaining accuracy. Furthermore, as a practical application, the proposed method is used to directly classify licorice seeds using near-infrared spectroscopy data. The simulation results in different spectral regions illustrates that the proposed method achieves equivalent classification performance to traditional logistic regressions yet suppresses more features. These results show the feasibility and effectiveness of the proposed method.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Karsmakers P, Pelckmans K, Suykens JAK (2007) Multi-class kernel logistic regression: a fixed-size implementation. In: Proceedings of the International Joint Conference on Neural Networks, Orlando, pp., 1756-1761
Koh K, Kim SJ, Boyd S (2007) An Interior-Point Method for Large-Scale L 1-Regularized Logistic Regression. J Machine Learn Res 8:1519–1555
Ryali S, Supekar K, Abrams DA, Menon V (2010) Sparse logistic regression for whole-brain classification of fMRI data. NeuroImage 51(2):752–764
Aseervatham S, Antoniadis A, Gaussier E, Burlet M , Denneulin Y (2011) A sparse version of the ridge logistic regression for large-scale text categorization. Pattern Recogn Lett 32:101–106
Bielza C, Robles V, Larranaga P (2011) Regularized logistic regression without a penalty term: An application to cancer classification with microarray data. Appl Expert Syst 389:5110–5118
Maher MM, Trafalis TB, Adrianto I (2011) Kernel logistic regression using truncated Newton method. Comput Manag Sci 8:415–428
Vapnik VN (1998) Statistical Learning Theory. Wiley, New York
Guyon I (2003) An Introduction to Variable and Feature Selection. J Machine Learn Res 3:1157–1182
Le Thi HA, Le Hoai M, Vinh Nguyen V, Pham Dinh T (2008) A DC programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2:259–278
Yang LM, Wang LSH, Sun YH, Zhang RY (2010) Simultaneous feature selection and classification via Minimax Probability Machine. J Comput Intell Syst 3(6):754–760
Musa AB (2013) A comparison of l 1-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression. Int J Mach Learn Cyber. doi:10.1007/s13042-013-0171-7
Zou H (2006) The Adaptive Lasso and Its Oracle Properties. J Amer Statist Assoc 101:1418–1429
Lin ZHY, Xiang YB, Zhang CY (2009) Adaptive Lasso in high-dimensional settings. J Nonparametric Statist 21(6):683–696
Le HM, Le Thi HA, Nguyen MC (2015) Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing
Pham Dinh T, Le Thi TA, Akoa F (2008) Combining DCA (DC Algorithms) and interior point techniques for large-scale nonconvex quadratic programming. Optim Methods Softw 23(4):609–629
Guan W, Gray A (2013) Sparse high-dimensional fractional-norm support vector machine via DC programming. Comput Stat Data Anal 67:136–148
Le Thi HA, Le Hoai M, Pham Dinh T (2014) New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recogn 47:388–401
Chouzenoux E, Jezierska A, Christophe JP, Talbot H (2013) A Majorize-minimize approach for l 2- l 0 image regularization. SIAM J Imaging Sciety 6(1):563–591
Herskovits J (1998) Feasible direction interior-point technique for nonlinear optimization. J Optim Theory and Appl 99(1):121–146
Bakhtiari S, Tits AL (2003) A simple primal-dual feasible interior-point method for nonlinear programming with monotone descent. Comput Optim Appl 25:17–38
Bohning D (1999) The lower bound method in probit regression. Comput Stat Data Anal 30:13–17
Minka TP (2003) A comparison of numerical optimizers for logistic regression, http://research.microsoft.com/minka/papers/logreg/
Zhang M (2008) Primal-dual interior-point methods for linearly constrained convex optimization. Master’s Thesis, China
Zhang CH, Shao YH, Tan JY, Deng NY (2013) Mixed-norm linear support vector machine. Neural Comput Appl 23:2159–2166. doi:10.1007/s00521-012-1166-0
Rangarijan YAL (2003) The concave-convex procedure. Neural Comput 15:915–936
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Know Data Eng 17:299–310
Zhu J, Rosset S, Hastie T (2003) l 1-norm support vector machines. In: Neural Information Processing Systems. Cambridge: MIT Press
Wang G, Ma M, Zhang Z, Xiang Y, Harrington Pde B (2013) A novel DPSO-SVM system for variable interval selection of endometrial tissue sections by near infrared spectroscopy. Talanta 112(15):136–142
Yang LM, Go YP, Sun Q (2015) A New Minimax Probabilistic Approach and Its Application in Recognition the Purity of Hybrid Seeds CMES:Comp. Model Eng Sci 104(6):493–506
Acknowledgments
This work is supported by National Nature Science Foundation of China (11471010,11271367).
Author information
Authors and Affiliations
Corresponding author
Appendix: The primal-dual interior-point method for solving convex problem (32)
Appendix: The primal-dual interior-point method for solving convex problem (32)
Note that p(y = 1/x)+p(y = −1/x)=1, and thus problem 32) can be written as:
where:
Let x = (b, w, t) with x∈R 2n+1and
Then the problem (42) is equivalent to
Introducing Lagrangian multiplier s with components s i (s i ≥0, the Lagrangian function for the problem (44) can be expressed as
where
where I n×n is unit matrix and 0 n×1 stands for a real n×1 matrix. The first-order necessary optimality conditions for (44) is
where
where ξ n×1 denotes a real n×1 matrix. Let A x = z, the above system (47) can be written as
Accordingto the primal-dual interior-point algorithm, we replace s T z = 0 by s i z i = μ(μ>0) in the system (50), and then obtain
where z i is the i-th component of the variable z. The above system (52) is a perturbation of the first-order optimality conditions for (47). For a certain μ, the Newton method is used to solve this system, and then we decease μ to 0. Finally, we get the approximation solution for the system (47). Moreover, at each iteration the Newton direction is obtained by solving:
Rights and permissions
About this article
Cite this article
Yang, L., Qian, Y. A sparse logistic regression framework by difference of convex functions programming. Appl Intell 45, 241–254 (2016). https://doi.org/10.1007/s10489-016-0758-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0758-2