Relating $$\ell _p$$ regularization and reweighted $$\ell _1$$ regularization

Wang, Hao; Zeng, Hao; Wang, Jiashan; Wu, Qiong

doi:10.1007/s11590-020-01685-x

Relating $\ell _p$ regularization and reweighted $\ell _1$ regularization

Original Paper
Published: 29 January 2021

Volume 15, pages 2639–2660, (2021)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Hao Wang ORCID: orcid.org/0000-0001-8821-7260¹,
Hao Zeng¹,
Jiashan Wang² &
…
Qiong Wu³

537 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

We propose a general framework of iteratively reweighted $\ell _1$ algorithms for solving $\ell _p$ regularization problems. We show that all the limit points of the iterates generated by the proposed algorithms have the same sign. Moreover, for sufficiently large iterations, the iterates also have the same sign as the limit points, and the nonzero components are bounded away from zero. Therefore, the algorithm behaves like solving a smooth problem in the reduced space consisting of the nonzero components. We analyze the global convergence and the worst-case complexity for the reweighted algorithms. Besides, a smoothing parameter updating strategy is proposed which can automatically stop reducing the smoothing parameters corresponding to the zero components of the limit points. We show that the $\ell _p$ regularized regression problem is locally equivalent to a weighted $\ell _1$ regularization problem near a stationary point and every stationary point corresponds to a Maximum A Posterior estimation for independently and non-identically distributed Laplace prior parameters. Numerical experiments exhibit the behaviors and the efficiency of our proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

References

Babacan, S.D., Molina, R., Katsaggelos, A.K.: Bayesian compressive sensing using laplace priors. IEEE Trans. Image Process. 19(1), 53–63 (2010). https://doi.org/10.1109/TIP.2009.2032894
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Dao, M.N., Moursi, W.M.: On fejér monotone sequences and nonexpansive mappings. arXiv preprint arXiv:1507.05585 (2015)
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted $\ell _1$ minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Article MathSciNet Google Scholar
Chen, X., Niu, L., Yuan, Y.X.: Optimality conditions and a smoothing trust region newton method for nonlipschitz optimization. SIAM J. Optim. 23, 1528–1552 (2013)
Article MathSciNet Google Scholar
Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of $\ell _2-\ell _p$ minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Article MathSciNet Google Scholar
Chen, X., Zhou, W.: Convergence of reweighted $\ell _1$ minimization algorithms and unique solution of truncated lp minimization. The Hong Kong Polytechnic University, Department of Applied Mathematics (2010)
Figueiredo, M.A., Bioucas-Dias, J.M., Nowak, R.D.: Majorization–minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process. 16(12), 2980–2991 (2007)
Article MathSciNet Google Scholar
Figueiredo, M.A., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1(4), 586–597 (2007)
Article Google Scholar
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of $\ell _p$ minimization. Math. Program. 129(2), 285–299 (2011)
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC, Boca Raton (2015)
Book Google Scholar
Lai, M.J., Wang, J.: An unconstrained $ \ell _q $ minimization with $0<q\le 1$ for sparse solution of underdetermined linear systems. SIAM J. Optim. 21(1), 82–101 (2011)
Article MathSciNet Google Scholar
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018)
Article Google Scholar
Liu, Z., Jiang, F., Tian, G., Wang, S., Sato, F., Meltzer, S.J., Tan, M.: Sparse logistic regression with lp penalty for biomarker identification. Stat. Appl. Genet. Mol. Biol. 6(1), 6 (2007)
Article MathSciNet Google Scholar
Lu, Z.: Iterative reweighted minimization methods for $\ell _p$ regularized unconstrained nonlinear programming. Math. Program. 147(1–2), 277–307 (2014)
Article MathSciNet Google Scholar
Patrikalakis, N.M., Maekawa, T.: Shape Interrogation for Computer Aided Design and Manufacturing. Springer, Berlin (2009)
MATH Google Scholar
Scardapane, S., Comminiello, D., Hussain, A., Uncini, A.: Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017)
Article Google Scholar
Sokolov, V., Polson, M.: Strategic Bayesian asset allocation (2019)
Sun, T., Jiang, H., Cheng, L.: Global convergence of proximal iteratively reweighted algorithm. J. Glob. Optim. 68(4), 815–826 (2017)
Article MathSciNet Google Scholar
Wang, H., Li, D.H., Zhang, X.J., Wu, L.: Optimality conditions for the constrained l p-regularization. Optimization 64(10), 2183–2197 (2015)
Article MathSciNet Google Scholar
Wang, H., Zhang, F., Wu, Q., Hu, Y., Shi, Y.: Nonconvex and nonsmooth sparse optimization via adaptively iterative reweighted methods. arXiv preprint arXiv:1810.10167 (2018)
Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 69(2), 297–324 (2018)
Article MathSciNet Google Scholar
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1–2 for compressed sensing. SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Article MathSciNet Google Scholar
Yu, P., Pong, T.K.: Iteratively reweighted $\ell _1$ algorithms with extrapolation. Comput. Optim. Appl. 73(2), 353–386 (2019)
Article MathSciNet Google Scholar
Zeng, J., Lin, S., Xu, Z.: Sparse regularization: convergence of iterative jumping thresholding algorithm. IEEE Trans. Signal Process. 64(19), 5106–5118 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Hao Wang was supported by the Young Scientists Fund of the National Natural Science Foundation of China under Grant 12001367.

Author information

Authors and Affiliations

School of Information Science and Technology, ShanghaiTech University, Shanghai, China
Hao Wang & Hao Zeng
Department of Mathematics, University of Washington, Seattle, USA
Jiashan Wang
Zhejiang Cainiao Supply Chain Management Co., Ltd, Hangzhou, China
Qiong Wu

Authors

Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jiashan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

We discuss two common examples to see that the iterates $\{x^k\}$ generated by Algorithm 1 generally has a unique limit point. By Theorem 6, we only have to verify whether for a stationary point $x^*$, $\nabla ^2 F( [x_{\mathcal{I}^*}^*; 0_{\mathcal{A}^*}]) $ is invertible.

$\ell _p$ regularized linear regression The loss function is $f(x) = \tfrac{1}{2}\Vert Ax-b\Vert ^2_2$ with $A\in \mathbb {R}^{m\times n}$, $x\in \mathbb {R}^n$ and $b\in \mathbb {R}^m$. Therefore, $ \nabla ^2 F( [x_{\mathcal{I}^*}^*; 0_{\mathcal{A}^*}]) = [A^TA]_{\mathcal{I}^*\mathcal{I}^*} + \lambda \nabla ^2 \Vert x_{\mathcal{I}^*}^*\Vert ^p_p$. In this case, the algorithm converges uniquely to the stationary point as long as $ [A^TA]_{\mathcal{I}^*\mathcal{I}^*} + \lambda \nabla ^2 \Vert x_{\mathcal{I}^*}^*\Vert ^p_p$ is nonsingular.

$\ell _p$ regularized logistic regression The loss function is

$$\begin{aligned} f(x) = \sum \limits _{i=1}^m[y_i\ln \sigma (a_i^Tx) + (1-y_i)\ln (1-\sigma (a_i^Tx))],\end{aligned}$$

with $a_i\in \mathbb {R}^n, y_i\in \{0,1\}$ and $\sigma (s) = \frac{1}{1+e^{-s}}$. In this case, $[\nabla ^2 f(x^*)]_{\mathcal{I}^*\mathcal{I}^*} = [A^TD(x^*)A]_{\mathcal{I}^*\mathcal{I}^*}$, where $D(x^*) = \text {diag}(\sigma (a_i^Tx^*)(1-\sigma (a_i^Tx^*)), i\in \{1,\ldots ,m\})$ and $A = [a_1, \ldots , a_m]$. In this case, the algorithm converges uniquely to the stationary point as long as $[A^TD(x^*)A]_{\mathcal{I}^*\mathcal{I}^*} + \lambda \nabla ^2 \Vert x_{\mathcal{I}^*}^*\Vert ^p_p$ is nonsingular.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Zeng, H., Wang, J. et al. Relating $\ell _p$ regularization and reweighted $\ell _1$ regularization. Optim Lett 15, 2639–2660 (2021). https://doi.org/10.1007/s11590-020-01685-x

Download citation

Received: 31 January 2020
Accepted: 08 December 2020
Published: 29 January 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11590-020-01685-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relating \(\ell _p\) regularization and reweighted \(\ell _1\) regularization

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Relating \(\ell _p\) regularization and reweighted \(\ell _1\) regularization

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation