Abstract
Sparse multinomial logistic regression has recently received widespread attention. It provides a useful tool for solving multi-classification problems in various fields, such as signal and image processing, machine learning and disease diagnosis. In this paper, we first study the group sparse multinomial logistic regression model and establish its optimality conditions. Based on the theoretical results of this model, we hence propose an efficient algorithm called the subspace quadratic regularization algorithm to compute a stationary point of a given problem. This algorithm enjoys excellent convergence properties, including the global convergence and locally quadratic convergence. Finally, our numerical results on standard benchmark data clearly demonstrate the superior performance of our proposed algorithm in terms of logistic loss value, sparsity recovery and computational time.








Similar content being viewed by others
References
Bahmani, S., Raj, B., Boufounos, P.: Greedy sparsity-constrained optimization. J. Mach. Learn. Res. 14, 807–841 (2013)
Beck, A., Hallak, N.: Optimization problems involving group sparsity terms. Math. Program. 178, 39–67 (2019)
Blondel, M., Seki, K., Uehara, K.: Block coordinate descent algorithms for large-scale sparse multiclass classification. Mach. Learn. 93, 31–52 (2013)
Böhning, D.: Multinomial logistic regression algorithm. Ann. Inst. Stat. Math. 44(1), 197–200 (1992)
Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput. 25(2), 173–187 (2015)
Byrne, E., Schniter, P.: Sparse multinomial logistic regression via approximate message passing. IEEE Trans. Signal Process. 64(21), 5485–5498 (2016)
Cawley, G.C., Talbot, N.L., Girolami, M.: Sparse multinomial logistic regression via Bayesian L1 regularisation. Adv. Neural Inf. Process. Syst. pp. 209–216 (2007)
Chen, X.J, Pan, L.L., Xiu, N.H.: Solution sets of three sparse optimization problems for multivariate regression. Technical Report, Department of Applied Mathematics, The Hong Kong Polytechnic University (2020)
Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2009)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Harrell, F.E., Jr.: Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer, Berlin (2015)
Kanzow, C., Qi, H.D.: A QP-free constrained Newton-type method for variational inequality problems. Math. Program. 85(1), 81–106 (1999)
Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 957–968 (2005)
Kwak, C., Clayton-Matthews, A.: Multinomial logistic regression. Nurs. Res. 51(6), 404–410 (2002)
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24, 1420–1443 (2014)
Li, F.F., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, IEEE (2004)
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. Adv. Neural Inf. Process. Syst. 379–387 (2015)
Li, J., Bioucas-Dias, J.M., Plaza, A.: Semi-supervised hyperspectral image classification using a new (soft) sparse multinomial logistic regression model. In: 2011 3rd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS). IEEE. 1-4 (2011)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, New York (1989)
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (1999)
Obozinski, G., Taskar, B., Jordan, M.: Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 20(2), 231–252 (2010)
Pang, T., Nie, F., Han, J., Li, X.: Efficient feature selection via \(l _{2,0}\)-norm constrained sparse regression. IEEE Trans. Knowl. Data Eng. 31(5), 880–893 (2019)
Rockafellar, R.T., Wets, R.J.: Variational Analysis. Springer, New York (1998)
Ryalia, S., Supekarbc, K., Abramsa, D.A., Menonad, V.: Sparse logistic regression for whole-brain classification of fMRI data. NeuroImage 51(2), 752–764 (2010)
Simon, N., Friedman, J., Hastie, T.: A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv preprint arXiv:1311.6529 (2013)
Sun, Y., Babu, P., Palomar, D.P.: Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal Process. 65(3), 794–816 (2016)
Tutz, G., Pößnecker, W., Uhlmann, L.: Variable selection in general multinomial logit models. Comput. Stat. Data Anal. 82, 207–222 (2015)
Vincent, M., Hansen, N.R.: Sparse group lasso and high dimensional multinomial classification. Comput. Stat. Data Anal. 71, 771–786 (2014)
Wang, R., Xiu, N.H., Zhou, S.L.: An extended Newton-type algorithm for l2-regularized sparse logistic regression and its efficiency for classifying large-scale datasets. J. Comput. Appl. Math. 397, 113656 (2021)
Acknowledgements
We sincerely thank the associate editor and two referees for their detailed comments that have helped to improve this paper. The research of Rui Wang and Naihua Xiu is partially supported by the National Natural Science Foundation of China (11971052), the Beijing National Science Foundation (Z190002), and the research of Kim-Chuan Toh is partially supported by the Ministry of Education of Singapore under ARF Grant Number R-146-000-257-112.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proofs of Lemmas 1 and 2
Appendix A: Proofs of Lemmas 1 and 2
Proof of Lemma 1:
Since the results of Lemma 1 (i)–(iii) can be directly calculated, we omit their proofs and give the following detailed derivation of 1 (iv).
Denote
If no confusion arises, we use the simple notation \(A=A(W),\ B=B(W),\ C=C(W)\). Since \(A^{(k)}=C^{(k)}-\sum \limits _{\begin{array}{c} j\ne k\\ j=1 \end{array}}^{m-1}B^{(k,j)},\ k=1,2,\ldots ,(m-1)\), we have
Therefore, we can rewrite the Hessian matrix as
For any \(z=(z_1; z_2;\ldots ;z_{m-1}) \in {\mathbb {R}}^{p(m-1)}\) with \(z_i \in {\mathbb {R}}^{p},\ i=1,2,\ldots ,(m-1)\), we have
which means that \(\nabla ^{2}\ell (W)\succeq {\mathbf {0}}\).
To verify the second inequality of (3), D. Böhning [4] has shown that the Hessian matrix is upper bounded by a positive definite matrix that does not depend on W, i.e.,
We next prove the Lipschitz continuity of the Hessian matrix. Denote
where \(\mathbf{z }=(\mathbf{z }^{1},\mathbf{z }^{2},\ldots ,\mathbf{z }^{m-1})^{\top }=W^{\top }x_i \in {\mathbb {R}}^{m-1}\) with \(\mathbf{z }^k={w^{(k)}}^{\top }x_i\) for \(i \in \{1,2,\ldots ,n\}\). We first consider the following two cases.
Case 1. If \(j\ne k\),
Case 2. If \(j= k\),
Hence, \(\Vert \nabla h(\mathbf{z } )\Vert \le 2\sqrt{m-1}.\) Then by the mean value theorem, there exists \(\theta \in (0,1)\) such that for any \(i \in \{1,2,\ldots ,n\}\),
where \(\varpi := 2\sqrt{m-1}\Vert X\Vert _\infty\). Similarly, we have
Notice that for any \(k,j \in \{1,2,\ldots ,m-1\}\),
Therefore,
which completes the proof of the Lemma. \(\square\)
Proof of Lemma 2:
For any \(W,D\in {\mathbb {R}}^{p\times {(m-1)}}\) and \(\bar{t} \in [0,1]\), there exists \(\varXi = W+\bar{t}(D-W)\) such that
Let t be a scalar parameter and \(g(t)=\ell (W+tD)\). The chain rule yields
Then, we have
Hence, we have completed the proof. \(\square\)
Rights and permissions
About this article
Cite this article
Wang, R., Xiu, N. & Toh, KC. Subspace quadratic regularization method for group sparse multinomial logistic regression. Comput Optim Appl 79, 531–559 (2021). https://doi.org/10.1007/s10589-021-00287-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-021-00287-2