Abstract
We consider a mini-batch stochastic Bregman proximal gradient method and a mini-batch stochastic Bregman proximal extragradient method for stochastic convex composite optimization problems. A simplified and unified convergence analysis framework is proposed to obtain almost sure convergence properties and expected convergence rates of the mini-batch stochastic Bregman proximal gradient method and its variants. This framework can also be used to analyze the convergence of the mini-batch stochastic Bregman proximal extragradient method, which has seldom been discussed in the literature. We point out that the standard uniformly bounded variance assumption and the usual Lipschitz gradient continuity assumption are not required in the analysis.


Similar content being viewed by others
References
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: from Theory to Algorithms. Cambridge University Press, New York, NY, USA (2014)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2011)
Fu, M.C.: Optimization for simulation: theory vs. practice. INFORMS J. Comput. 14(3), 192–215 (2002)
Fu, M.C.: Handbook of Simulation Optimization, International Series in Operations Research and Management Science. Springer, New York (2015)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Newton, D., Youseian, F., Pasupathy, R.: Stochastic gradient descent: Recent trends. In: E. Gel, L. Ntaimo (eds.) Recent Advances in Optimization and Modeling of Contemporary Problems, pp. 193–220. INFORMS (2018)
Atchadé, Y.F., Fort, G., Moulines, E.: On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18, 1–33 (2017)
Lei, J., Shanbhag, U.V.: Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: block-specific steplengths and adapted batch-sizes. Optim Methods Softw 0, 1–31 (2020)
Lei, J., Shanbhag, U.V.: Variance-reduced accelerated first-order methods: central limit theorems and confidence statements (2020). https://arxiv.org/abs/2006.07769
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1646–1654. Curran Associates, Inc. (2014)
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155(1–2, Ser. A), 105–145 (2016)
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1–2, Ser. A), 267–305 (2016)
Ghadimi, S.: Conditional gradient type methods for composite nonlinear and stochastic optimization. Math. Program. 173(1–2, Ser. A), 431–464 (2019)
Jofré, A., Thompson, P.: On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Program. 174(1–2, Ser. B), 253–292 (2019)
Xu, Y., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)
Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)
Yousefian, F., Nedić, A., Shanbhag, U.V.: On stochastic mirror-prox algorithms for stochastic Cartesian variational inequalities: randomized block coordinate and optimal averaging schemes. Set-Valued Var. Anal. 26(4), 789–819 (2018)
Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Ekon. Mat. Metody 12, 747–756 (1976)
Xiu, N., Zhang, J.: Some recent advances in projection-type methods for variational inequalities. J. Comput. Appl. Math. 152(1–2), 559–585 (2003)
Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems. Springer-Verlag, New York (2003)
Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.: Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686–724 (2017)
Iusem, A.N., Jofré, A., Oliveira, R.I., Thompson, P.: Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 29(1), 175–206 (2019)
Kannan, A., Shanbhag, U.V.: Optimal stochastic extragradient schemes for pseudomonotone stochastic variational inequality problems and their variants. Comput. Optim. Appl. 74(3), 779–820 (2019)
Jalilzadeh, A., Shanbhag, U.V.: eg-VSSA: An extragradient variable sample-size stochastic approximation scheme: Error analysis and complexity trade-offs. In: 2016 Winter Simulation Conference (WSC), pp. 690–701 (2016)
Lin, T., Ma, S., Zhang, S.: An extragradient-based alternating direction method for convex minimization. Found. Comput. Math. 17(1), 35–59 (2017)
Nguyen, T.P., Pauwels, E., Richard, E., Suter, B.W.: Extragradient method in optimization: convergence and complexity. J. Optim. Theory Appl. 176(1), 137–162 (2018)
Yang, M., Milzarek, A., Wen, Z., Zhang, T.: A stochastic extra-step quasi-newton method for nonsmooth nonconvex optimization (2019). https://arxiv.org/abs/1910.09373
Chavdarova, T., Gidel, G., Fleuret, F., Lacoste-Julien, S.: Reducing noise in gan training with variance reduced extragradient. In: H. Wallach, H. Laro chelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 393–403. Curran Associates, Inc. (2019)
Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the convergence of single-call stochastic extra-gradient methods. In: Advances in Neural Information Processing Systems 32, pp. 6938–6948. Curran Associates, Inc. (2019)
Mokhtari, A., Ozdaglar, A., Pattathil, S.: A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. pp. 1497–1507. PMLR, Online (2020)
Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Rustagi, J.S. (ed.) Optimizing methods in statistics, pp. 233–257. Academic Press, New York (1971)
Rockafellar, R.T.: Convex analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J. (1970)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170(1, Ser. B), 67–96 (2018)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)
Grimmer, B.: Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity. SIAM J. Optim. 29(2), 1350–1365 (2019)
Nguyen, T.H., Simsekli, U., Gurbuzbalaban, M., Richard, G.: First exit time analysis of stochastic gradient descent under heavy-tailed gradient noise. In: H. Wallach, H. Laro chelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 273–283. Curran Associates, Inc. (2019)
Lei, Y., Hu, T., Li, G., Tang, K.: Stochastic gradient descent for nonconvex learning without bounded gradient assumptions. IEEE Trans. Neural Netw. Learn. Syst. 31(10), 4394–4400 (2020)
Lei, Y., Ying, Y.: Fine-grained analysis of stability and generalization for stochastic gradient descent (2020). https://arxiv.org/abs/2006.08157
Cui, S., Shanbhag, U.V.: On the analysis of variance-reduced and randomized projection variants of single projection schemes for monotone stochastic variational inequality problems (2019). https://arxiv.org/abs/1904.11076
Acknowledgements
The author would like to thank the referees and the associate editor for their helpful comments and suggestions. This work was partially supported by the National Natural Science Foundation of China (No. 11871135) and the Fundamental Research Funds for the Central Universities (No. DUT19K46).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Alfredo N. Iusem.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xiao, X. A Unified Convergence Analysis of Stochastic Bregman Proximal Gradient and Extragradient Methods. J Optim Theory Appl 188, 605–627 (2021). https://doi.org/10.1007/s10957-020-01799-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-020-01799-3