Skip to main content
Log in

Inexact proximal stochastic gradient method for convex composite optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We study an inexact proximal stochastic gradient (IPSG) method for convex composite optimization, whose objective function is a summation of an average of a large number of smooth convex functions and a convex, but possibly nonsmooth, function. Variance reduction techniques are incorporated in the method to reduce the stochastic gradient variance. The main feature of this IPSG algorithm is to allow solving the proximal subproblems inexactly while still keeping the global convergence with desirable complexity bounds. Different subproblem stopping criteria are proposed. Global convergence and the component gradient complexity bounds are derived for the both cases when the objective function is strongly convex or just generally convex. Preliminary numerical experiment shows the overall efficiency of the IPSG algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The datasets are available at http://www.gems-system.org.

References

  1. Bauschke, H.H., Combettes, P.L.: A dykstra-like algorithm for two monotone operators. Pac. J. Optim. 4(3), 383–391 (2008)

    MATH  MathSciNet  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont, Massachusetts (1999)

    MATH  Google Scholar 

  4. Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. Ser. B 129, 163–195 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. arXiv: 1507.01030v1, 3 July (2015)

  6. Cai, J.-F., Candès, E.J., She, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  7. Cai, J.-F., Candès, E.J., She, Z.: Fast newton-type methods for total variation regularization. In: ICML (2011)

  8. Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS, pp. 1646–1654 (2014)

  9. Duchi, J., Hazan, E., Singe, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2015)

    MATH  MathSciNet  Google Scholar 

  10. Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 54(12), 3736–3745 (2006)

    Article  MathSciNet  Google Scholar 

  11. Fadili, J., Peyrè, G.: Total variation projection with first order schemes. IEEE Trans. Image Process. 20(3), 657–669 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  12. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22(4), 1469–1492 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  13. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061–2089 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  14. Hovhannisyan, V., Parpas, P., Zafeiriou, S.: Magma: multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization. arXiv: 1509.05715v3, July (2016)

  15. Jang, K., Sun, D., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex sdp. SIAM J. Optim. 22(3), 1042–1064 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  16. Jenatton, R., Mairal, J., Bach, F.R., Obozinski, G.R.: Proximal methods for sparse hierarchical dictionary learning. In: ICML, pp. 487–494 (2010)

  17. Johnson R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013)

  18. Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

  19. Konečný, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Signal Process. 10(2), 242–255 (2016)

    Article  Google Scholar 

  20. Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  21. Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. 128(1), 321–353 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  22. Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. 106(3), 697 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  23. Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Convex and network flow optimization for structured sparsity. J. Mach. Learn. Res. 12, 2681–2720 (2011)

    MATH  MathSciNet  Google Scholar 

  24. Nesterov, Y.E.: Gradient methods for minimizing composite objective function. Math. Program. Ser. B 140, 341–362 (2013)

    Article  Google Scholar 

  25. Nitanda, A.: Stochastic proximal gradient descent with acceleration techniques. In: NIPS, pp. 1574–1582 (2014)

  26. Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 20(2), 231–252 (2009)

    Article  MathSciNet  Google Scholar 

  27. Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 4(19), 1167–1192 (2012)

    MATH  MathSciNet  Google Scholar 

  28. Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. (2016). doi:10.1007/s10107-016-0997-3

  29. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. (2016). doi:10.1007/s10107-016-1030-6

    MATH  Google Scholar 

  30. Schmidt, M., Le Roux, N., Bach, F.: Supplementary material for the paper convergence rates of inexact proximal-gradient methods for convex optimization. In: NIPS (2011)

  31. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)

    MATH  MathSciNet  Google Scholar 

  32. Tibshirani, R., Hastie, T., Friedman, J.: The elements of statistical learning: data mining, 2nd edn. In: Inference. Springer, New York (2009)

  33. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  34. Wang, M., Bertsekas, D.P.: Incremental constraint projection methods for variational inequalities. Math. Program. 150(2), 321–363 (2015)

    Article  MATH  MathSciNet  Google Scholar 

  35. Wang, M., Bertsekas, D.P.: Stochastic first-order methods with random constraint projection. SIAM J. Optim. 26(1), 681–717 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  36. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24, 2057–2075 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  37. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  38. Zălinescu, C.: Convex Analysis in General Vector Spaces. World Scientific Publishing Co. Inc., Singapore (2002)

    Book  MATH  Google Scholar 

  39. Zhang, Y., Lin, X.: Stochastic primal-dual coordinate method for regularized empirical risk minimization. In: ICML (2015)

  40. Zhu, Z.A., Yuan, Y.: Univr: A universal variance reduction framework for proximal stochastic gradient method. arXiv: 1506.01972v1, 5 June (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Wang.

Additional information

This research is partially supported by the National Natural Science Foundation of China 11301505 and the National Science Foundation of USA 1522654.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Wang, S. & Zhang, H. Inexact proximal stochastic gradient method for convex composite optimization. Comput Optim Appl 68, 579–618 (2017). https://doi.org/10.1007/s10589-017-9932-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-017-9932-7

Keywords

Mathematics Subject Classification

Navigation