Skip to main content
Log in

Gradient methods for minimizing composite functions

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two terms: one is smooth and given by a black-box oracle, and another is a simple general convex function with known structure. Despite the absence of good properties of the sum, such problems, both in convex and nonconvex cases, can be solved with efficiency typical for the first part of the objective. For convex problems of the above structure, we consider primal and dual variants of the gradient method (with convergence rate \(O\left({1 \over k}\right)\)), and an accelerated multistep version with convergence rate \(O\left({1 \over k^2}\right)\), where \(k\) is the iteration counter. For nonconvex problems with this structure, we prove convergence to a point from which there is no descent direction. In contrast, we show that for general nonsmooth, nonconvex problems, even resolving the question of whether a descent direction exists from a point is NP-hard. For all methods, we suggest some efficient “line search” procedures and show that the additional computational work necessary for estimating the unknown problem class parameters can only multiply the complexity of each iteration by a small constant factor. We present also the results of preliminary computational experiments, which confirm the superiority of the accelerated scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. An interested reader can find a good survey of the literature, existing minimization techniques, and new methods in [3] and [5].

  2. However, this idea has much longer history. To the best of our knowledge, for the general framework this technique was originally developed in [4].

  3. In this paper, for the sake of simplicity, we restrict ourselves to Euclidean norms only. The extension onto the general case can be done in a standard way using Bregman distances (e.g. [10]).

References

  1. Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)

    Article  MathSciNet  Google Scholar 

  2. Claerbout, J., Muir, F.: Robust modelling of eratic data. Geophysics 38, 826–844 (1973)

    Article  Google Scholar 

  3. Figueiredo, M., Novak, R., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. Submitted for publication

  4. Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain nonconvex problems. Int. J. Sys. Sci. 12(8), 989–1000 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  5. Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: A method for large-scale \(l_1\)-regularized least-squares problems with applications in signal processing and statistics. Stanford University, March 20, Research report (2007)

  6. Levy, S., Fullagar, P.: Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution. Geophysics 46, 1235–1243 (1981)

    Article  Google Scholar 

  7. Miller, A.: Subset Selection in Regression. Chapman and Hall, London (2002)

    Book  MATH  Google Scholar 

  8. Nemirovsky, A., Yudin, D.: Informational Complexity and Efficient Methods for Solution of Convex Extremal Problems. Wiley, New-York (1983)

    Google Scholar 

  9. Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer, Boston (2004)

    MATH  Google Scholar 

  10. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. (A) 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  11. Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Discussion Paper \(\#\) 2007/76, CORE (2007)

  12. Nesterov, Y.: Rounding of convex sets and efficient gradient methods for linear programming problems. Optim. Methods Softw. 23(1), 109–135 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  13. Nesterov, Y.: Accelerating the cubic regularization of Newton’s method on convex problems. Math. Program. 112(1), 159–181 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Nesterov, Y., Nemirovskii, A.: Interior Point Polynomial Methods in Convex Programming: Theory and Applications. SIAM, Philadelphia (1994)

    Book  Google Scholar 

  15. Ortega, J., Rheinboldt, W.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)

    MATH  Google Scholar 

  16. Santosa, F., Symes, W.: Linear inversion of band-limited reflection histograms. SIAM J. Sci. Stat. Comput. 7, 1307–1330 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  17. Taylor, H., Bank, S., McCoy, J.: Deconvolution with the \(l_1\) norm. Geophysics 44, 39–52 (1979)

    Article  Google Scholar 

  18. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  19. Tropp, J.: Just relax: convex programming methods for identifying sparse signals. IEEE Trans. Inf. Theory 51, 1030–1051 (2006)

    Article  MathSciNet  Google Scholar 

  20. Wright, S.J.: Solving \(l_{1}\)-Regularized Regression Problems. Talk at International Conference “Combinatorics and Optimization”, Waterloo (June 2007)

Download references

Acknowledgments

The author would like to thank M. Overton, Y. Xia, and anonymous referees for numerous useful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu. Nesterov.

Additional information

Dedicated to Claude Lemaréchal on the Occasion of his 65th Birthday.

The author acknowledges the support from Office of Naval Research grant # N000140811104: Efficiently Computable Compressed Sensing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nesterov, Y. Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-012-0629-5

Keywords

Mathematics Subject Classification

Navigation