Skip to main content
Log in

Local linear convergence of proximal coordinate descent algorithm

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

For composite nonsmooth optimization problems, which are “regular enough”, proximal gradient descent achieves model identification after a finite number of iterations. For instance, for the Lasso, this implies that the iterates of proximal gradient descent identify the non-zeros coefficients after a finite number of steps. The identification property has been shown for various optimization algorithms, such as accelerated gradient descent, Douglas–Rachford or variance-reduced algorithms, however, results concerning coordinate descent are scarcer. Identification properties often rely on the framework of “partial smoothness”, which is a powerful but technical tool. In this work, we show that partial smooth functions have a simple characterization when the nonsmooth penalty is separable. In this simplified framework, we prove that cyclic coordinate descent achieves model identification in finite time, which leads to explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

Data used in this article is open and can be downloaded on the libsvm website: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

Notes

  1. Note that some local rates are shown in [56] but under some strong hypothesis.

References

  1. Bach, F.: Consistency of the group Lasso and multiple Kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)

    MathSciNet  Google Scholar 

  2. Beck, A., Tetruashvili, L.: On the convergence of block coordinate type methods. SIAM J. Imaging Sci. 23(4), 651–694 (2013)

    MathSciNet  Google Scholar 

  3. Bertrand,Q.  Klopfenstein,Q.  Blondel,M.  Vaiter,S.  Gramfort,A. and  Salmon,J. Implicit differentiation of lasso-type models for hyperparameter optimization. In: International Conference on Machine Learning, 2020

  4. Bertrand,Q. Klopfenstein, Q. Massias, M. Blondel, M. Vaiter, S.  Gramfort,A.  Salmon,J. Implicit differentiation for fast hyperparameter selection in non-smooth convex learning. arXiv preprint arXiv:2105.01637, 2021

  5. Bertsekas, D.P.: On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans. Autom. Control 21(2), 174–184 (1976)

    MathSciNet  Google Scholar 

  6. Bertsekas,D. P. Convex Optimization Theory, Chapter 1 Exercises and Solutions: Extended Version, Massachusetts Institute of Technology. URL http://www.athenasc.com/convexdualitysol1.pdf, 2009

  7. Boser,B. E. Guyon,I. M. Vapnik,V. N. A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, 1992

  8. Burke, J.V., Moré, J.J.: On the identification of active constraints. SIAM J. Numer. Anal. 25(5), 1197–1211 (1988)

    MathSciNet  Google Scholar 

  9. Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2(3), 27 (2011)

    Google Scholar 

  10. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1998)

    MathSciNet  Google Scholar 

  11. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)

    MathSciNet  Google Scholar 

  12. Fadili,J.  Garrigos,G.  Malick,J. Peyré,G. Model consistency for learning with mirror-stratifiable regularizers. In: AISTATS, pp. 1236–1244. PMLR, 2019

  13. Fadili, J., Malick, J., Peyré, G.: Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4), 2975–3000 (2018)

    MathSciNet  Google Scholar 

  14. Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(3), 1997–2013 (2015)

    MathSciNet  Google Scholar 

  15. Friedman, J., Hastie, T.J., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    MathSciNet  Google Scholar 

  16. Friedman, J., Hastie, T.J., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Google Scholar 

  17. Hare, W.L.: Identifying active manifolds in regularization problems. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 261–271. Springer, London (2011)

    Google Scholar 

  18. Hare, W.L., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)

    MathSciNet  Google Scholar 

  19. Hare, W.L., Lewis, A.S.: Identifying active manifolds. Algorithmic Oper. Res. 2(2), 75–75 (2007)

    MathSciNet  Google Scholar 

  20. Hong, M., Wang, X., Razaviyayn, M., Luo, Z.-Q.: Iteration complexity analysis of block coordinate descent methods. Math. Program. 163(1–2), 85–114 (2017)

    MathSciNet  Google Scholar 

  21. Iutzeler, F., Malick, J.: Nonsmoothness in machine learning: Specific structure, proximal identification, and applications. Set-Valued Var. Anal. 28(4), 661–678 (2020)

    MathSciNet  Google Scholar 

  22. Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: Convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)

    MathSciNet  Google Scholar 

  23. Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)

    MathSciNet  Google Scholar 

  24. Li, X., Zhao, T., Arora, R., Liu, H., Hong, M.: On faster convergence of cyclic block coordinate descent-type methods for strongly convex minimization. J. Mach. Learn. Res. 18(1), 6741–6764 (2017)

    MathSciNet  Google Scholar 

  25. Liang, J., Fadili, J., Peyré, G.: Local linear convergence of forward-backward under partial smoothness. Adv. Neural Inf. Process. Syst. 27, 1970–1978 (2014)

    Google Scholar 

  26. Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward-backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)

    MathSciNet  Google Scholar 

  27. Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    MathSciNet  Google Scholar 

  28. Luo, Z.-Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)

    MathSciNet  Google Scholar 

  29. Massias, M., Gramfort, A., Salmon, J.: Celer: A fast solverthe lasso with dual extrapolation. In: ICML 80, pp. 3315–3324 (2018)

  30. Massias, M., Vaiter, S., Gramfort, A., Salmon, J.: Dual extrapolation for sparse generalized linear models. J. Mach. Learn. Res. 21, 1–33 (2020)

    MathSciNet  Google Scholar 

  31. Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175(1–2), 69–107 (2019)

    MathSciNet  Google Scholar 

  32. Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)

    MathSciNet  Google Scholar 

  33. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    MathSciNet  Google Scholar 

  34. Nutini, J. Greed is Good: Greedy Optimization Methods for Large-Scale Structured Problems. PhD thesis, University of British Columbia, 2018

  35. Nutini,J. Laradji,I. Schmidt, M. Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence. arXiv preprint arXiv:1712.08859, 2017

  36. Nutini, J., Schmidt, M., Hare, W.: Active-set complexity of proximal gradient: How long does it take to find the sparsity pattern? Optim. Lett. 13(4), 645–655 (2019)

    MathSciNet  Google Scholar 

  37. Nutini,J. Schmidt, M. W. Laradji,I. H. Friedlander, M. P. Koepke,H. A. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In: ICML, pp. 1632–1641, 2015

  38. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  39. Poliquin, R., Rockafellar, R.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348(5), 1805–1838 (1996)

    MathSciNet  Google Scholar 

  40. Polyak, B.T.: Introduction to Optimization. Optimization Software. Inc., Publications Division, New York (1987)

    Google Scholar 

  41. Poon, C., Liang, J.: Trajectory of alternating direction method of multipliers and adaptive acceleration. Adv. Neural Inf. Process. Syst. 32, 7357–7365 (2019)

    Google Scholar 

  42. Poon, C., Liang, J., Schönlieb, C.-B.: Local convergence properties of SAGA/Prox-SVRG and acceleration. In: International Conference on Machine Learning 90, pp. 4121–4129 (2018)

  43. Qu, Z., Richtárik, P.: Coordinate descent with arbitrary sampling i: Algorithms and complexity. Optim. Methods Softw. 31(5), 829–857 (2016)

    MathSciNet  Google Scholar 

  44. Qu, Z., Richtárik, P.: Coordinate descent with arbitrary sampling ii: Expected separable overapproximation. Optim. Methods Softw. 31(5), 858–884 (2016)

    MathSciNet  Google Scholar 

  45. Razaviyayn, M., Hong, M., Luo, Z.-Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)

    MathSciNet  Google Scholar 

  46. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    MathSciNet  Google Scholar 

  47. Saha, A., Tewari, A.: On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM J. Optim. 23(1), 576–601 (2013)

    MathSciNet  Google Scholar 

  48. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)

    MathSciNet  Google Scholar 

  49. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)

    MathSciNet  Google Scholar 

  50. J. She and M. Schmidt. Linear convergence and support vector identification of sequential minimal optimization. In 10th NIPS Workshop on Optimization for Machine Learning, volume 5, 2017

  51. H.-J. M. Shi, S. Tu, Y. Xu, and W. Yin. A primer on coordinate descent algorithms. ArXiv e-prints, 2016

  52. Sun, R., Hong, M.: Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. Adv. Neural Inf. Process. Syst. 28, 1306–1314 (2015)

    Google Scholar 

  53. Tao, S., Boley, D., Zhang, S.: Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM J. Optim. 26(1), 313–336 (2016)

    MathSciNet  Google Scholar 

  54. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58(1), 267–288 (1996)

    MathSciNet  Google Scholar 

  55. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)

    MathSciNet  Google Scholar 

  56. Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140(3), 513 (2009)

    MathSciNet  Google Scholar 

  57. Vaiter, S., Golbabaee, M., Fadili, J., Peyré, G.: Model selection with low complexity priors. Inf. Inference: A J. IMA 4(3), 230–287 (2015)

    MathSciNet  Google Scholar 

  58. Vaiter, S., Peyré, G., Fadili, J.: Model consistency of partly smooth regularizers. IEEE Trans. Inf. Theory 64(3), 1725–1737 (2018)

    MathSciNet  Google Scholar 

  59. H. Wang, H. Zeng, and J. Wang. Convergence rate analysis of proximal iteratively reweighted \(\ell _1\) methods for \(\ell _p\) regularization problems. Optim. Lett., pp. 1–23, 2022

  60. Wang, H., Zeng, H., Wang, J., Wu, Q.: Relating \(\ell _p\) regularization and reweighted \(\ell _1\) regularization. Optim. Lett. 15(8), 2639–2660 (2021)

    MathSciNet  Google Scholar 

  61. Wright, S.J.: Identifiable surfaces in constrained optimization. SIAM J. Control. Optim. 31(4), 1063–1079 (1993)

    MathSciNet  Google Scholar 

  62. Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)

    MathSciNet  Google Scholar 

  63. Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. J. Sci. Comput. 72(2), 700–734 (2017)

    MathSciNet  Google Scholar 

  64. Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 116, (2004)

  65. Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MathSciNet  Google Scholar 

  66. Zou, H., Hastie, T.J.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat Methodol. 67, 301–320 (2005)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quentin Klopfenstein.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Klopfenstein, Q., Bertrand, Q., Gramfort, A. et al. Local linear convergence of proximal coordinate descent algorithm. Optim Lett 18, 135–154 (2024). https://doi.org/10.1007/s11590-023-01976-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-023-01976-z

Keywords

Navigation