A sparse logistic regression framework by difference of convex functions programming

Yang, Liming; Qian, Yannan

doi:10.1007/s10489-016-0758-2

A sparse logistic regression framework by difference of convex functions programming

Published: 19 February 2016

Volume 45, pages 241–254, (2016)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Liming Yang¹ &
Yannan Qian¹

575 Accesses
Explore all metrics

Abstract

Feature selection for logistic regression (LR) is still a challenging subject. In this paper, we present a new feature selection method for logistic regression based on a combination of the zero-norm and l ₂-norm regularization. However, discontinuity of the zero-norm makes it difficult to find the optimal solution. We apply a proper nonconvex approximation of the zero-norm to derive a robust difference of convex functions (DC) program. Moreover, DC optimization algorithm (DCA) is used to solve the problem effectively and the corresponding DCA converges linearly. Compared with traditional methods, numerical experiments on benchmark datasets show that the proposed method reduces the number of input features while maintaining accuracy. Furthermore, as a practical application, the proposed method is used to directly classify licorice seeds using near-infrared spectroscopy data. The simulation results in different spectral regions illustrates that the proposed method achieves equivalent classification performance to traditional logistic regressions yet suppresses more features. These results show the feasibility and effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hyperspectral Image Prediction Using Logistic Regression Model

The impact of high-quality data on the assessment results of visible/near-infrared hyperspectral imaging and development direction in the food fields: a review

Article 09 February 2023

Sparsity-regularized feature selection for multi-class remote sensing image classification

Article 25 January 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Karsmakers P, Pelckmans K, Suykens JAK (2007) Multi-class kernel logistic regression: a fixed-size implementation. In: Proceedings of the International Joint Conference on Neural Networks, Orlando, pp., 1756-1761
Koh K, Kim SJ, Boyd S (2007) An Interior-Point Method for Large-Scale L ₁-Regularized Logistic Regression. J Machine Learn Res 8:1519–1555
MathSciNet MATH Google Scholar
Ryali S, Supekar K, Abrams DA, Menon V (2010) Sparse logistic regression for whole-brain classification of fMRI data. NeuroImage 51(2):752–764
Article Google Scholar
Aseervatham S, Antoniadis A, Gaussier E, Burlet M , Denneulin Y (2011) A sparse version of the ridge logistic regression for large-scale text categorization. Pattern Recogn Lett 32:101–106
Article Google Scholar
Bielza C, Robles V, Larranaga P (2011) Regularized logistic regression without a penalty term: An application to cancer classification with microarray data. Appl Expert Syst 389:5110–5118
Article Google Scholar
Maher MM, Trafalis TB, Adrianto I (2011) Kernel logistic regression using truncated Newton method. Comput Manag Sci 8:415–428
Article MathSciNet MATH Google Scholar
Vapnik VN (1998) Statistical Learning Theory. Wiley, New York
MATH Google Scholar
Guyon I (2003) An Introduction to Variable and Feature Selection. J Machine Learn Res 3:1157–1182
MATH Google Scholar
Le Thi HA, Le Hoai M, Vinh Nguyen V, Pham Dinh T (2008) A DC programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2:259–278
Article MathSciNet MATH Google Scholar
Yang LM, Wang LSH, Sun YH, Zhang RY (2010) Simultaneous feature selection and classification via Minimax Probability Machine. J Comput Intell Syst 3(6):754–760
Article Google Scholar
Musa AB (2013) A comparison of l ₁-regularizion, PCA, KPCA and ICA for dimensionality reduction in logistic regression. Int J Mach Learn Cyber. doi:10.1007/s13042-013-0171-7
Zou H (2006) The Adaptive Lasso and Its Oracle Properties. J Amer Statist Assoc 101:1418–1429
Article MathSciNet MATH Google Scholar
Lin ZHY, Xiang YB, Zhang CY (2009) Adaptive Lasso in high-dimensional settings. J Nonparametric Statist 21(6):683–696
Article MathSciNet MATH Google Scholar
Le HM, Le Thi HA, Nguyen MC (2015) Sparse semi-supervised support vector machines by DC programming and DCA. Neurocomputing
Pham Dinh T, Le Thi TA, Akoa F (2008) Combining DCA (DC Algorithms) and interior point techniques for large-scale nonconvex quadratic programming. Optim Methods Softw 23(4):609–629
Article MathSciNet MATH Google Scholar
Guan W, Gray A (2013) Sparse high-dimensional fractional-norm support vector machine via DC programming. Comput Stat Data Anal 67:136–148
Article MathSciNet Google Scholar
Le Thi HA, Le Hoai M, Pham Dinh T (2014) New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recogn 47:388–401
Article MATH Google Scholar
Chouzenoux E, Jezierska A, Christophe JP, Talbot H (2013) A Majorize-minimize approach for l ₂- l ₀ image regularization. SIAM J Imaging Sciety 6(1):563–591
Article MATH Google Scholar
Herskovits J (1998) Feasible direction interior-point technique for nonlinear optimization. J Optim Theory and Appl 99(1):121–146
Article MathSciNet MATH Google Scholar
Bakhtiari S, Tits AL (2003) A simple primal-dual feasible interior-point method for nonlinear programming with monotone descent. Comput Optim Appl 25:17–38
Article MathSciNet MATH Google Scholar
Bohning D (1999) The lower bound method in probit regression. Comput Stat Data Anal 30:13–17
Article MathSciNet MATH Google Scholar
Minka TP (2003) A comparison of numerical optimizers for logistic regression, http://research.microsoft.com/minka/papers/logreg/
Zhang M (2008) Primal-dual interior-point methods for linearly constrained convex optimization. Master’s Thesis, China
Google Scholar
Zhang CH, Shao YH, Tan JY, Deng NY (2013) Mixed-norm linear support vector machine. Neural Comput Appl 23:2159–2166. doi:10.1007/s00521-012-1166-0
Article Google Scholar
Rangarijan YAL (2003) The concave-convex procedure. Neural Comput 15:915–936
Article Google Scholar
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Know Data Eng 17:299–310
Article Google Scholar
Zhu J, Rosset S, Hastie T (2003) l ₁-norm support vector machines. In: Neural Information Processing Systems. Cambridge: MIT Press
Wang G, Ma M, Zhang Z, Xiang Y, Harrington Pde B (2013) A novel DPSO-SVM system for variable interval selection of endometrial tissue sections by near infrared spectroscopy. Talanta 112(15):136–142
Article Google Scholar
Yang LM, Go YP, Sun Q (2015) A New Minimax Probabilistic Approach and Its Application in Recognition the Purity of Hybrid Seeds CMES:Comp. Model Eng Sci 104(6):493–506
Google Scholar

Download references

Acknowledgments

This work is supported by National Nature Science Foundation of China (11471010,11271367).

Author information

Authors and Affiliations

College of Science, China Agricultural University, Beijing, 100083, China
Liming Yang & Yannan Qian

Authors

Liming Yang
View author publications
You can also search for this author inPubMed Google Scholar
Yannan Qian
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Liming Yang.

Appendix: The primal-dual interior-point method for solving convex problem (32)

Note that p(y = 1/x)+p(y = −1/x)=1, and thus problem 32) can be written as:

$$\begin{array}{@{}rcl@{}} \min\limits_{b,\mathbf{w},\mathbf{t}}&&\left\{G(b,\mathbf{w},\mathbf{t})-\langle \mathbf{v}^{k},\mathbf{w}\rangle: (b,\mathbf{w},\mathbf{t})\in {\Omega}\right\}\\[-2pt] =\min\limits_{b,\mathbf{w},\mathbf{t}}&&\left\{-\sum\limits_{y_{i}=1} (b+ \mathbf{w}^{T} \mathbf{x}_{i})+\sum\limits_{i}log\left( 1+e^{b+\mathbf{w}^{T} \mathbf{x}_{i}}\right)+\lambda \|\mathbf{w}\|_{2}^{2}\right.\\[-2pt] &&\left.+\mu \sum\limits_{j=1}^{n} a | w_{j}|-\mu\langle \mathbf{v}^{k},\mathbf{w}\rangle: (b,\mathbf{w},\mathbf{t})\in {\Omega}\right\}\\[-2pt] =\min\limits_{b,\mathbf{w},\mathbf{t}}&&\left\{-\sum\limits_{y_{i}=1} \left( b+\mathbf{w}^{T} \mathbf{x}_{i}\right)+\sum\limits_{i}log\left( 1+e^{b+\mathbf{w}^{T} x_{i}}\right)+\lambda \|\mathbf{w}\|_{2}^{2}\right.\\[-2pt] &&\left.+\mu\sum\limits_{j=1}^{n}t_{j}-\mu\langle \mathbf{v}^{k},\mathbf{w}\rangle : (b,\mathbf{w},\mathbf{t})\in {\Omega}\right\} \end{array} $$

(42)

where:

$${\Omega}\,=\,\left\{(b,\mathbf{w},\mathbf{t})\in R^{2n+1}:-\alpha w_{j}\leq t_{j},\alpha w_{j}\leq t_{j},j=1...n \right\} $$

Let x = (b, w, t) with x∈R ²ⁿ⁺¹and

$$\begin{array}{@{}rcl@{}} F(\mathbf{x})&=&\!\!-\sum\limits_{y_{i}=1}\! \left( b+\mathbf{w}^{T} \mathbf{x}_{i}\right)\,+\,\sum\limits_{i}log\left( 1+e^{b+\mathbf{w}^{T}\mathbf{x}_{i}}\right)+\lambda \|\mathbf{w} \|_{2}^{2}\\ &&+\mu\sum\limits_{j=1}^{n}t_{j}-\mu\langle \mathbf{v}^{k},\mathbf{w}\rangle\ \end{array} $$

(43)

Then the problem (42) is equivalent to

$$\begin{array}{@{}rcl@{}} \min\limits_{b,\mathbf{w},\mathbf{t}}&& F(b,\mathbf{w},\mathbf{t})\\ \textit{ s.t.}&& -a w_{j}\leq t_{j},a w_{j}\leq t_{j},j=1...n \label {03} \end{array} $$

(44)

Introducing Lagrangian multiplier s with components s _i(s _i≥0, the Lagrangian function for the problem (44) can be expressed as

$$\begin{array}{@{}rcl@{}} F(\mathbf{x})-\mathbf{s}^{T}A\mathbf{x}, s\geq 0, \mathbf{s} \in R^{2n} \end{array} $$

(45)

where

$$\begin{array}{@{}rcl@{}} A=\left( \begin{array}{lcr} 0_{n\times1} &\alpha*I_{n\times n}&I_{n\times n}\\ 0_{n\times 1} &-\alpha*I_{n\times n}&I_{n\times n} \end{array} \right) \end{array} $$

(46)

where I _n×n is unit matrix and 0_n×1 stands for a real n×1 matrix. The first-order necessary optimality conditions for (44) is

$$\begin{array}{@{}rcl@{}} \nabla F(\mathbf{x})-A^{T}{s}=0 \\ {s}^{T}A\mathbf{x}=0 ,\mathbf{s}\geq 0 \end{array} $$

(47)

$$\begin{array}{@{}rcl@{}} Ax\geq 0 \end{array} $$

(48)

where

$$\begin{array}{@{}rcl@{}} \nabla F(\mathbf{x})=\left( \begin{array}{lcr} -{\sum}_{y_{i}=1}1+{\sum}_{i}\frac{e^{b+\mathbf{w}^{T} x_{i}}}{1+e^{b+\mathbf{w}^{T} x_{i}}}\\ -{\sum}_{y_{i}=1}\mathbf{x}_{i}+{\sum}_{i}\dfrac {e^{b+\mathbf{w}^{T} \mathbf{x}_{i}}}{1+e^{b+\mathbf{w}^{T} \mathbf{x}_{i}}}\mathbf{x}_{i}+2\lambda \mathbf{w}-\mu \mathbf{v}^{k}\\ \mu \mathbf{\xi}_{n\times1} \end{array} \right) \end{array} $$

(49)

where ξ _n×1 denotes a real n×1 matrix. Let A x = z, the above system (47) can be written as

$$\begin{array}{@{}rcl@{}} \nabla F(\mathbf{x})-A^{T}\mathbf{s}=0 \\ A\mathbf{x}=\mathbf{z},\mathbf{z}\geq 0 \end{array} $$

(50)

$$\begin{array}{@{}rcl@{}} \mathbf{s}^{T}\mathbf{z}=0 ,\mathbf{s}\geq 0 \end{array} $$

(51)

Accordingto the primal-dual interior-point algorithm, we replace s ^T z = 0 by s _i z _i = μ(μ>0) in the system (50), and then obtain

$$\begin{array}{@{}rcl@{}} \nabla F(\mathbf{x})-A^{T}\mathbf{s}=0 \\ A\mathbf{x}=\mathbf{z} ,\mathbf{z} \geq 0 \end{array} $$

(52)

$$\begin{array}{@{}rcl@{}} s_{i}z_{i}=\mu, s_{i}\geq 0 \end{array} $$

(53)

where z _i is the i-th component of the variable z. The above system (52) is a perturbation of the first-order optimality conditions for (47). For a certain μ, the Newton method is used to solve this system, and then we decease μ to 0. Finally, we get the approximation solution for the system (47). Moreover, at each iteration the Newton direction is obtained by solving:

$$\begin{array}{@{}rcl@{}} \nabla^{2}F(\mathbf{x})\triangle \mathbf{x}-A^{T}{\Delta} \mathbf{s}=-\nabla F(\mathbf{x})+A^{T}\mathbf{s} \\ A{\Delta} y-{\Delta} \mathbf{z}=\mathbf{z}-A\mathbf{x},\mathbf{z}\geq 0 \end{array} $$

(54)

$$\begin{array}{@{}rcl@{}} s_{i}{\Delta} z+{\Delta} s_{i}z=\mu-s_{i}z_{i}, s_{i} \geq 0 \end{array} $$

(55)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Qian, Y. A sparse logistic regression framework by difference of convex functions programming. Appl Intell 45, 241–254 (2016). https://doi.org/10.1007/s10489-016-0758-2

Download citation

Published: 19 February 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10489-016-0758-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sparse logistic regression framework by difference of convex functions programming

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hyperspectral Image Prediction Using Logistic Regression Model

The impact of high-quality data on the assessment results of visible/near-infrared hyperspectral imaging and development direction in the food fields: a review

Sparsity-regularized feature selection for multi-class remote sensing image classification

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: The primal-dual interior-point method for solving convex problem (32)

Appendix: The primal-dual interior-point method for solving convex problem (32)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now