Skip to main content
Log in

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

We introduce four accelerated (sub)gradient algorithms (ASGA) for solving several classes of convex optimization problems. More specifically, we propose two estimation sequences majorizing the objective function and develop two iterative schemes for each of them. In both cases, the first scheme requires the smoothness parameter and a Hölder constant, while the second scheme is parameter-free (except for the strong convexity parameter which we set zero if it is not available) at the price of applying a finitely terminated backtracking line search. The proposed algorithms attain the optimal complexity for smooth problems with Lipschitz continuous gradients, nonsmooth problems with bounded variation of subgradients, and weakly smooth problems with Hölder continuous gradients. Further, for strongly convex problems, they are optimal for smooth problems while nearly optimal for nonsmooth and weakly smooth problems. Finally, numerical results for some applications in sparse optimization and machine learning are reported, which confirm the theoretical foundations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ahookhosh M (2015) High-dimensional nonsmooth convex optimization via optimal subgradient methods. Ph.D. thesis, University of Vienna

  • Ahookhosh M (2018) Optimal subgradient methods: computational properties for large-scale linear inverse problems. Optim Eng 19(4):815–844

    Article  MathSciNet  Google Scholar 

  • Ahookhosh M, Ghederi S (2017) On efficiency of nonmonotone Armijo-type line searches. Appl Math Model 43:170–190

    Article  MathSciNet  Google Scholar 

  • Ahookhosh M, Neumaier A (2017) An optimal subgradient algorithm for large-scale bound-constrained convex optimization. Math Methods Oper Res 86(1):123–147

    Article  MathSciNet  MATH  Google Scholar 

  • Ahookhosh M, Neumaier A (2017) Optimal subgradient algorithms for large-scale convex optimization in simple domains. Numer Algorithms 76(4):1071–1097

    Article  MathSciNet  MATH  Google Scholar 

  • Ahookhosh M, Neumaier A (2018) Solving nonsmooth convex optimization with complexity \(\cal{O}^{-1/2}\). TOP 26(1):110–145

    MathSciNet  MATH  Google Scholar 

  • Ahookhosh M, Themelis A, Patrinos P (2019) Bregman forward-backward splitting for nonconvex composite optimization: superlinear convergence to nonisolated critical points. arXiv:1905.11904

  • Amini K, Ahookhosh M, Nosratipour H (2014) An inexact line search approach using modified nonmonotone strategy for unconstrained optimization. Numer Algorithms 66:49–78

    Article  MathSciNet  MATH  Google Scholar 

  • Auslender A, Teboulle M (2006) Interior gradient and proximal methods for convex and conic optimization. SIAM J Optim 16:697–725

    Article  MathSciNet  MATH  Google Scholar 

  • Baes M (2009) Estimate sequence methods: extensions and approximations. IFOR Internal report, ETH, Zurich, Switzerland

  • Baes M, Bürgisser M (2014) An acceleration procedure for optimal first-order methods. Optim Methods Softw 9(3):610–628

    Article  MathSciNet  MATH  Google Scholar 

  • Bauschke HH, Bolte J, Teboulle M (2016) A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math Oper Res 42(2):330–348

    Article  MathSciNet  MATH  Google Scholar 

  • Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper Res Lett 31:167–175

    Article  MathSciNet  MATH  Google Scholar 

  • Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2:183–202

    Article  MathSciNet  MATH  Google Scholar 

  • Becker SR, Candès EJ, Grant MC (2011) Templates for convex cone problems with applications to sparse signal recovery. Math Program Comput 3:165–218

    Article  MathSciNet  MATH  Google Scholar 

  • Boyd S, Xiao L, Mutapcic A (2003) Subgradient methods. http://www.stanford.edu/class/ee392o/subgrad_method.pdf

  • Chen Y, Lan G, Ouyang Y (2014) Optimal primal-dual methods for a class of saddle point problems. SIAM J Optim 24(4):1779–1814

    Article  MathSciNet  MATH  Google Scholar 

  • Chen Y, Lan G, Ouyang Y (2015) An accelerated linearized alternating direction method of multipliers. SIAM J Imaging Sci 8(1):644–681

    Article  MathSciNet  MATH  Google Scholar 

  • Devolder O, Glineur F, Nesterov Y (2013) First-order methods with inexact oracle: the strongly convex case. CORE Discussion Paper 2013/16

  • Devolder O, Glineur F, Nesterov Y (2014) First-order methods of smooth convex optimization with inexact oracle. Math Program 146:37–75

    Article  MathSciNet  MATH  Google Scholar 

  • Ghadimi S (2019) Conditional gradient type methods for composite nonlinear and stochastic optimization. Math Program 173:431–464

    Article  MathSciNet  MATH  Google Scholar 

  • Ghadimi S, Lan G, Zhang H (2019) Generalized uniformly optimal methods for nonlinear programming. J Sci Comput. https://doi.org/10.1007/s10915-019-00915-4

    MathSciNet  Google Scholar 

  • Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–536

    Article  Google Scholar 

  • Gonzaga CC, Karas EW (2013) Fine tuning Nesterov’s steepest descent algorithm for differentiable convex programming. Math Program 138:141–166

    Article  MathSciNet  MATH  Google Scholar 

  • Gonzaga CC, Karas EW, Rossetto DR (2013) An optimal algorithm for constrained differentiable convex optimization. SIAM J Optim 23(4):1939–1955

    Article  MathSciNet  MATH  Google Scholar 

  • Hanzely F, Richtarik P, Xiao L (2018) Accelerated Bregman proximal gradient methods for relatively smooth convex optimization. arXiv:1808.03045

  • http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=43

  • Juditsky A, Nesterov Y (2014) Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stoch Syst 4(1):44–80

    Article  MathSciNet  MATH  Google Scholar 

  • Lan G (2010) An optimal method for stochastic composite optimization. Math Program 133:365–397

    Article  MathSciNet  MATH  Google Scholar 

  • Lan G (2015) Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Math Program 149:1–45

    Article  MathSciNet  MATH  Google Scholar 

  • Lan G, Lu Z, Monteiro RDC (2011) Primal-dual first-order methods with \(O(1/\varepsilon )\) iteration-complexity for cone programming. Math Program 126:1–29

    Article  MathSciNet  MATH  Google Scholar 

  • Lu H, Freund R, Nesterov Y (2018) Relatively smooth convex optimization by first-order methods, and applications. SIAM J Optim 28(1):333–354

    Article  MathSciNet  MATH  Google Scholar 

  • Nemirovskii AS, Nesterov YE (1985) Optimal methods of smooth convex minimization. USSR Comput Math Math Phys 25(2):21–30

    Article  MathSciNet  Google Scholar 

  • Nemirovskii AS, Nesterov Y (1985) Optimal methods of smooth convex minimization. USSR Comput Math Math Phys 25(2):21–30

    Article  MathSciNet  Google Scholar 

  • Nemirovsky AS, Yudin DB (1983) Problem complexity and method efficiency in optimization. Wiley, New York

    Google Scholar 

  • Nesterov Y (1983) A method of solving a convex programming problem with convergence rate \(O(1/k^2)\), Doklady AN SSSR (In Russian), 269 543–547. English translation: Soviet Math. Dokl., 27, 372–376 (1983)

  • Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Kluwer, Dordrecht

    Book  MATH  Google Scholar 

  • Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103:127–152

    Article  MathSciNet  MATH  Google Scholar 

  • Nesterov Y (2005) Excessive gap technique in nonsmooth convex minimization. SIAM J Optim 16:235–249

    Article  MathSciNet  MATH  Google Scholar 

  • Nesterov Y (2013) Gradient methods for minimizing composite objective function. Math Program 140:125–161

    Article  MathSciNet  MATH  Google Scholar 

  • Nesterov Y (2015) Universal gradient methods for convex optimization problems. Math Program 152:381–404

    Article  MathSciNet  MATH  Google Scholar 

  • Nesterov Y (2018) Complexity bounds for primal-dual methods minimizing the model of objective function. Math Program 171(1–2):311–330

    Article  MathSciNet  MATH  Google Scholar 

  • Neumaier A (1998) Solving ill-conditioned and singular linear systems: a tutorial on regularization. SIAM Rev 40(3):636–666

    Article  MathSciNet  MATH  Google Scholar 

  • Neumaier A (2001) Introduction to numerical analysis. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Neumaier A (2016) OSGA: a fast subgradient algorithm with optimal complexity. Math Program 158(1–2):1–21

    Article  MathSciNet  MATH  Google Scholar 

  • Renegar J, Grimmer B (2018) A simple nearly-optimal restart scheme for speeding-up first order methods. arxiv:1803.00151

  • Roulet V, d’Aspremont A (2017) Sharpness, restart and acceleration. arxiv:1702.03828

  • Shawe-Taylor J, Sun S (2011) A review of optimization methodologies in support vector machines. Neurocomputing 74:3609–3618

    Article  Google Scholar 

  • Themelis A, Ahookhosh M, Panagiotis P (2019) On the acceleration of forward-backward splitting via an inexact Newton method. In Luke R, Bauschke H, Burachik R (eds) Splitting algorithms, modern operator theory, and applications. Springer (to appear)

  • Tseng P (2008) On accelerated proximal gradient methods for convex-concave optimization, Manuscript. http://pages.cs.wisc.edu/~brecht/cs726docs/Tseng.APG.pdf

  • Wright SJ, Nowak RD, Figueiredo MAT (2009) Sparse reconstruction by separable approximation. IEEE Trans Signal Process 57(7):2479–2493

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. Adv Neural Inf Process Syst 16:49–56

    Google Scholar 

Download references

Acknowledgements

I would like to thank Arnold Neumaier for his useful comments on this paper. I am really grateful of anonymous referees and the associated editor for their constructive comments and suggestions that improved the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masoud Ahookhosh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The numerical results for the elastic net minimization and the support vector machine are given in Tables 3 and 4, respectively.

Table 3 Numerical results of NSDSG, NESCO, NESUN, ASGA-1, ASGA-2, ASGA-3, and ASGA-4 for the elastic net minimization problems (54) and (55)
Table 4 Numerical results of NSDSG, NESUN, ASGA-1, ASGA-2, ASGA-3, and ASGA-4 for the binary classification with linear support vector machines (57)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahookhosh, M. Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity. Math Meth Oper Res 89, 319–353 (2019). https://doi.org/10.1007/s00186-019-00674-w

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-019-00674-w

Keywords

Mathematics Subject Classification

Navigation