Local linear convergence of proximal coordinate descent algorithm

Klopfenstein, Quentin; Bertrand, Quentin; Gramfort, Alexandre; Salmon, Joseph; Vaiter, Samuel

doi:10.1007/s11590-023-01976-z

Local linear convergence of proximal coordinate descent algorithm

Original Paper
Published: 22 March 2023

Volume 18, pages 135–154, (2024)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Quentin Klopfenstein ORCID: orcid.org/0000-0002-5771-6013¹,
Quentin Bertrand²,
Alexandre Gramfort²,
Joseph Salmon³ &
…
Samuel Vaiter⁴

384 Accesses
1 Altmetric
Explore all metrics

Abstract

For composite nonsmooth optimization problems, which are “regular enough”, proximal gradient descent achieves model identification after a finite number of iterations. For instance, for the Lasso, this implies that the iterates of proximal gradient descent identify the non-zeros coefficients after a finite number of steps. The identification property has been shown for various optimization algorithms, such as accelerated gradient descent, Douglas–Rachford or variance-reduced algorithms, however, results concerning coordinate descent are scarcer. Identification properties often rely on the framework of “partial smoothness”, which is a powerful but technical tool. In this work, we show that partial smooth functions have a simple characterization when the nonsmooth penalty is separable. In this simplified framework, we prove that cyclic coordinate descent achieves model identification in finite time, which leads to explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coordinate descent methods beyond smoothness and separability

Article 13 February 2024

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Article 04 February 2017

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Article 29 June 2021

Data availability

Data used in this article is open and can be downloaded on the libsvm website: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

Notes

Note that some local rates are shown in [56] but under some strong hypothesis.

References

Bach, F.: Consistency of the group Lasso and multiple Kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)
MathSciNet Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate type methods. SIAM J. Imaging Sci. 23(4), 651–694 (2013)
MathSciNet Google Scholar
Bertrand,Q. Klopfenstein,Q. Blondel,M. Vaiter,S. Gramfort,A. and Salmon,J. Implicit differentiation of lasso-type models for hyperparameter optimization. In: International Conference on Machine Learning, 2020
Bertrand,Q. Klopfenstein, Q. Massias, M. Blondel, M. Vaiter, S. Gramfort,A. Salmon,J. Implicit differentiation for fast hyperparameter selection in non-smooth convex learning. arXiv preprint arXiv:2105.01637, 2021
Bertsekas, D.P.: On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans. Autom. Control 21(2), 174–184 (1976)
MathSciNet Google Scholar
Bertsekas,D. P. Convex Optimization Theory, Chapter 1 Exercises and Solutions: Extended Version, Massachusetts Institute of Technology. URL http://www.athenasc.com/convexdualitysol1.pdf, 2009
Boser,B. E. Guyon,I. M. Vapnik,V. N. A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, 1992
Burke, J.V., Moré, J.J.: On the identification of active constraints. SIAM J. Numer. Anal. 25(5), 1197–1211 (1988)
MathSciNet Google Scholar
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2(3), 27 (2011)
Google Scholar
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)
MathSciNet Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
MathSciNet Google Scholar
Fadili,J. Garrigos,G. Malick,J. Peyré,G. Model consistency for learning with mirror-stratifiable regularizers. In: AISTATS, pp. 1236–1244. PMLR, 2019
Fadili, J., Malick, J., Peyré, G.: Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4), 2975–3000 (2018)
MathSciNet Google Scholar
Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(3), 1997–2013 (2015)
MathSciNet Google Scholar
Friedman, J., Hastie, T.J., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
MathSciNet Google Scholar
Friedman, J., Hastie, T.J., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Google Scholar
Hare, W.L.: Identifying active manifolds in regularization problems. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 261–271. Springer, London (2011)
Google Scholar
Hare, W.L., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)
MathSciNet Google Scholar
Hare, W.L., Lewis, A.S.: Identifying active manifolds. Algorithmic Oper. Res. 2(2), 75–75 (2007)
MathSciNet Google Scholar
Hong, M., Wang, X., Razaviyayn, M., Luo, Z.-Q.: Iteration complexity analysis of block coordinate descent methods. Math. Program. 163(1–2), 85–114 (2017)
MathSciNet Google Scholar
Iutzeler, F., Malick, J.: Nonsmoothness in machine learning: Specific structure, proximal identification, and applications. Set-Valued Var. Anal. 28(4), 661–678 (2020)
MathSciNet Google Scholar
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: Convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)
MathSciNet Google Scholar
Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)
MathSciNet Google Scholar
Li, X., Zhao, T., Arora, R., Liu, H., Hong, M.: On faster convergence of cyclic block coordinate descent-type methods for strongly convex minimization. J. Mach. Learn. Res. 18(1), 6741–6764 (2017)
MathSciNet Google Scholar
Liang, J., Fadili, J., Peyré, G.: Local linear convergence of forward-backward under partial smoothness. Adv. Neural Inf. Process. Syst. 27, 1970–1978 (2014)
Google Scholar
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward-backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)
MathSciNet Google Scholar
Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
MathSciNet Google Scholar
Luo, Z.-Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
MathSciNet Google Scholar
Massias, M., Gramfort, A., Salmon, J.: Celer: A fast solverthe lasso with dual extrapolation. In: ICML 80, pp. 3315–3324 (2018)
Massias, M., Vaiter, S., Gramfort, A., Salmon, J.: Dual extrapolation for sparse generalized linear models. J. Mach. Learn. Res. 21, 1–33 (2020)
MathSciNet Google Scholar
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175(1–2), 69–107 (2019)
MathSciNet Google Scholar
Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)
MathSciNet Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
MathSciNet Google Scholar
Nutini, J. Greed is Good: Greedy Optimization Methods for Large-Scale Structured Problems. PhD thesis, University of British Columbia, 2018
Nutini,J. Laradji,I. Schmidt, M. Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence. arXiv preprint arXiv:1712.08859, 2017
Nutini, J., Schmidt, M., Hare, W.: Active-set complexity of proximal gradient: How long does it take to find the sparsity pattern? Optim. Lett. 13(4), 645–655 (2019)
MathSciNet Google Scholar
Nutini,J. Schmidt, M. W. Laradji,I. H. Friedlander, M. P. Koepke,H. A. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In: ICML, pp. 1632–1641, 2015
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Poliquin, R., Rockafellar, R.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348(5), 1805–1838 (1996)
MathSciNet Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software. Inc., Publications Division, New York (1987)
Google Scholar
Poon, C., Liang, J.: Trajectory of alternating direction method of multipliers and adaptive acceleration. Adv. Neural Inf. Process. Syst. 32, 7357–7365 (2019)
Google Scholar
Poon, C., Liang, J., Schönlieb, C.-B.: Local convergence properties of SAGA/Prox-SVRG and acceleration. In: International Conference on Machine Learning 90, pp. 4121–4129 (2018)
Qu, Z., Richtárik, P.: Coordinate descent with arbitrary sampling i: Algorithms and complexity. Optim. Methods Softw. 31(5), 829–857 (2016)
MathSciNet Google Scholar
Qu, Z., Richtárik, P.: Coordinate descent with arbitrary sampling ii: Expected separable overapproximation. Optim. Methods Softw. 31(5), 858–884 (2016)
MathSciNet Google Scholar
Razaviyayn, M., Hong, M., Luo, Z.-Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
MathSciNet Google Scholar
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
MathSciNet Google Scholar
Saha, A., Tewari, A.: On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM J. Optim. 23(1), 576–601 (2013)
MathSciNet Google Scholar
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)
MathSciNet Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)
MathSciNet Google Scholar
J. She and M. Schmidt. Linear convergence and support vector identification of sequential minimal optimization. In 10th NIPS Workshop on Optimization for Machine Learning, volume 5, 2017
H.-J. M. Shi, S. Tu, Y. Xu, and W. Yin. A primer on coordinate descent algorithms. ArXiv e-prints, 2016
Sun, R., Hong, M.: Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. Adv. Neural Inf. Process. Syst. 28, 1306–1314 (2015)
Google Scholar
Tao, S., Boley, D., Zhang, S.: Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM J. Optim. 26(1), 313–336 (2016)
MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58(1), 267–288 (1996)
MathSciNet Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
MathSciNet Google Scholar
Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140(3), 513 (2009)
MathSciNet Google Scholar
Vaiter, S., Golbabaee, M., Fadili, J., Peyré, G.: Model selection with low complexity priors. Inf. Inference: A J. IMA 4(3), 230–287 (2015)
MathSciNet Google Scholar
Vaiter, S., Peyré, G., Fadili, J.: Model consistency of partly smooth regularizers. IEEE Trans. Inf. Theory 64(3), 1725–1737 (2018)
MathSciNet Google Scholar
H. Wang, H. Zeng, and J. Wang. Convergence rate analysis of proximal iteratively reweighted \(\ell _1\) methods for \(\ell _p\) regularization problems. Optim. Lett., pp. 1–23, 2022
Wang, H., Zeng, H., Wang, J., Wu, Q.: Relating \(\ell _p\) regularization and reweighted \(\ell _1\) regularization. Optim. Lett. 15(8), 2639–2660 (2021)
MathSciNet Google Scholar
Wright, S.J.: Identifiable surfaces in constrained optimization. SIAM J. Control. Optim. 31(4), 1063–1079 (1993)
MathSciNet Google Scholar
Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)
MathSciNet Google Scholar
Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. J. Sci. Comput. 72(2), 700–734 (2017)
MathSciNet Google Scholar
Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 116, (2004)
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
MathSciNet Google Scholar
Zou, H., Hastie, T.J.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat Methodol. 67, 301–320 (2005)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institut Mathématiques de Bourgogne, Université de Bourgogne Franche-Comté, Dijon, France
Quentin Klopfenstein
INRIA, CEA, Université Paris-Saclay, Palaiseau, France
Quentin Bertrand & Alexandre Gramfort
IMAG, CNRS, Université de Montpellier, Montpellier, France
Joseph Salmon
Centre National de la Recherche Scientifique (CNRS), Université Côte d’Azur, Nice, France
Samuel Vaiter

Authors

Quentin Klopfenstein
View author publications
You can also search for this author in PubMed Google Scholar
Quentin Bertrand
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Gramfort
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Salmon
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Vaiter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quentin Klopfenstein.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Klopfenstein, Q., Bertrand, Q., Gramfort, A. et al. Local linear convergence of proximal coordinate descent algorithm. Optim Lett 18, 135–154 (2024). https://doi.org/10.1007/s11590-023-01976-z

Download citation

Received: 06 December 2021
Accepted: 05 January 2023
Published: 22 March 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11590-023-01976-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local linear convergence of proximal coordinate descent algorithm

Abstract

Access this article

Similar content being viewed by others

Coordinate descent methods beyond smoothness and separability

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Local linear convergence of proximal coordinate descent algorithm

Abstract

Access this article

Similar content being viewed by others

Coordinate descent methods beyond smoothness and separability

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Variational Analysis Perspective on Linear Convergence of Some First Order Methods for Nonsmooth Convex Optimization Problems

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation