Abstract
We propose a general framework of iteratively reweighted \(\ell _1\) algorithms for solving \(\ell _p\) regularization problems. We show that all the limit points of the iterates generated by the proposed algorithms have the same sign. Moreover, for sufficiently large iterations, the iterates also have the same sign as the limit points, and the nonzero components are bounded away from zero. Therefore, the algorithm behaves like solving a smooth problem in the reduced space consisting of the nonzero components. We analyze the global convergence and the worst-case complexity for the reweighted algorithms. Besides, a smoothing parameter updating strategy is proposed which can automatically stop reducing the smoothing parameters corresponding to the zero components of the limit points. We show that the \(\ell _p\) regularized regression problem is locally equivalent to a weighted \(\ell _1\) regularization problem near a stationary point and every stationary point corresponds to a Maximum A Posterior estimation for independently and non-identically distributed Laplace prior parameters. Numerical experiments exhibit the behaviors and the efficiency of our proposed algorithms.
Similar content being viewed by others
References
Babacan, S.D., Molina, R., Katsaggelos, A.K.: Bayesian compressive sensing using laplace priors. IEEE Trans. Image Process. 19(1), 53–63 (2010). https://doi.org/10.1109/TIP.2009.2032894
Bauschke, H.H., Dao, M.N., Moursi, W.M.: On fejér monotone sequences and nonexpansive mappings. arXiv preprint arXiv:1507.05585 (2015)
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Chen, X., Niu, L., Yuan, Y.X.: Optimality conditions and a smoothing trust region newton method for nonlipschitz optimization. SIAM J. Optim. 23, 1528–1552 (2013)
Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of \(\ell _2-\ell _p\) minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Chen, X., Zhou, W.: Convergence of reweighted \(\ell _1\) minimization algorithms and unique solution of truncated lp minimization. The Hong Kong Polytechnic University, Department of Applied Mathematics (2010)
Figueiredo, M.A., Bioucas-Dias, J.M., Nowak, R.D.: Majorization–minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process. 16(12), 2980–2991 (2007)
Figueiredo, M.A., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1(4), 586–597 (2007)
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \(\ell _p\) minimization. Math. Program. 129(2), 285–299 (2011)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC, Boca Raton (2015)
Lai, M.J., Wang, J.: An unconstrained \( \ell _q \) minimization with \(0<q\le 1\) for sparse solution of underdetermined linear systems. SIAM J. Optim. 21(1), 82–101 (2011)
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018)
Liu, Z., Jiang, F., Tian, G., Wang, S., Sato, F., Meltzer, S.J., Tan, M.: Sparse logistic regression with lp penalty for biomarker identification. Stat. Appl. Genet. Mol. Biol. 6(1), 6 (2007)
Lu, Z.: Iterative reweighted minimization methods for \(\ell _p\) regularized unconstrained nonlinear programming. Math. Program. 147(1–2), 277–307 (2014)
Patrikalakis, N.M., Maekawa, T.: Shape Interrogation for Computer Aided Design and Manufacturing. Springer, Berlin (2009)
Scardapane, S., Comminiello, D., Hussain, A., Uncini, A.: Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017)
Sokolov, V., Polson, M.: Strategic Bayesian asset allocation (2019)
Sun, T., Jiang, H., Cheng, L.: Global convergence of proximal iteratively reweighted algorithm. J. Glob. Optim. 68(4), 815–826 (2017)
Wang, H., Li, D.H., Zhang, X.J., Wu, L.: Optimality conditions for the constrained l p-regularization. Optimization 64(10), 2183–2197 (2015)
Wang, H., Zhang, F., Wu, Q., Hu, Y., Shi, Y.: Nonconvex and nonsmooth sparse optimization via adaptively iterative reweighted methods. arXiv preprint arXiv:1810.10167 (2018)
Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 69(2), 297–324 (2018)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1–2 for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Yu, P., Pong, T.K.: Iteratively reweighted \(\ell _1\) algorithms with extrapolation. Comput. Optim. Appl. 73(2), 353–386 (2019)
Zeng, J., Lin, S., Xu, Z.: Sparse regularization: convergence of iterative jumping thresholding algorithm. IEEE Trans. Signal Process. 64(19), 5106–5118 (2016)
Acknowledgements
Hao Wang was supported by the Young Scientists Fund of the National Natural Science Foundation of China under Grant 12001367.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
We discuss two common examples to see that the iterates \(\{x^k\}\) generated by Algorithm 1 generally has a unique limit point. By Theorem 6, we only have to verify whether for a stationary point \(x^*\), \(\nabla ^2 F( [x_{\mathcal{I}^*}^*; 0_{\mathcal{A}^*}]) \) is invertible.
\(\ell _p\) regularized linear regression The loss function is \(f(x) = \tfrac{1}{2}\Vert Ax-b\Vert ^2_2\) with \(A\in \mathbb {R}^{m\times n}\), \(x\in \mathbb {R}^n\) and \(b\in \mathbb {R}^m\). Therefore, \( \nabla ^2 F( [x_{\mathcal{I}^*}^*; 0_{\mathcal{A}^*}]) = [A^TA]_{\mathcal{I}^*\mathcal{I}^*} + \lambda \nabla ^2 \Vert x_{\mathcal{I}^*}^*\Vert ^p_p\). In this case, the algorithm converges uniquely to the stationary point as long as \( [A^TA]_{\mathcal{I}^*\mathcal{I}^*} + \lambda \nabla ^2 \Vert x_{\mathcal{I}^*}^*\Vert ^p_p\) is nonsingular.
\(\ell _p\) regularized logistic regression The loss function is
with \(a_i\in \mathbb {R}^n, y_i\in \{0,1\}\) and \(\sigma (s) = \frac{1}{1+e^{-s}}\). In this case, \([\nabla ^2 f(x^*)]_{\mathcal{I}^*\mathcal{I}^*} = [A^TD(x^*)A]_{\mathcal{I}^*\mathcal{I}^*}\), where \(D(x^*) = \text {diag}(\sigma (a_i^Tx^*)(1-\sigma (a_i^Tx^*)), i\in \{1,\ldots ,m\})\) and \(A = [a_1, \ldots , a_m]\). In this case, the algorithm converges uniquely to the stationary point as long as \([A^TD(x^*)A]_{\mathcal{I}^*\mathcal{I}^*} + \lambda \nabla ^2 \Vert x_{\mathcal{I}^*}^*\Vert ^p_p\) is nonsingular.
Rights and permissions
About this article
Cite this article
Wang, H., Zeng, H., Wang, J. et al. Relating \(\ell _p\) regularization and reweighted \(\ell _1\) regularization. Optim Lett 15, 2639–2660 (2021). https://doi.org/10.1007/s11590-020-01685-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-020-01685-x