Gradient Methods for Problems with Inexact Model of the Objective

Stonyakin, Fedor S.; Dvinskikh, Darina; Dvurechensky, Pavel; Kroshnin, Alexey; Kuznetsova, Olesya; Agafonov, Artem; Gasnikov, Alexander; Tyurin, Alexander; Uribe, César A.; Pasechnyuk, Dmitry; Artamonov, Sergei

doi:10.1007/978-3-030-22629-9_8

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11548))

Included in the following conference series:

International Conference on Mathematical Optimization Theory and Operations Research

1213 Accesses
25 Citations

Abstract

We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [16] and relative smoothness condition [36]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [41]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Riemannian Optimization via Frank-Wolfe Methods

Article Open access 14 July 2022

Fast Primal-Dual Gradient Method for Strongly Convex Minimization Problems with Linear Constraints

Log-domain interior-point methods for convex quadratic programming

Article 09 January 2023

Notes

1.
Here and below for all (large) n: $\widetilde{O}(g(n)) \le \tilde{C}\cdot (\ln n)^r g(n)$ with some constants $\tilde{C} > 0$ and $r \ge 0$. Typically, $r = 1$, but not in this particular case. If $r=0$, then $\widetilde{O}(\cdot ) = O(\cdot )$.
2.
One can find the proof in Appendix E of the full version of the paper [51].
3.
This bound is rough and typically $\bar{c}_k$ is smaller in practice. By proper rounding of $\pi ^k$ one can guarantee (without loss of generality) that $\pi ^k_{ij}\ge \varepsilon /(2 n^2 \left||C\right||_\infty )$, which gives
$$ \frac{\bar{c}_k}{L} = \frac{\left||C\right||_\infty }{L} + \ln \left( \frac{2 n^2 \left||C\right||_\infty }{\varepsilon }\right) . $$
But, in practice there often is no need to make ‘rounding’ after each outer iteration.
4.
Our experiments on MNIST data set show (see Figs. 2, 3) that in practice the bound is better.
5.
Strictly speaking for the moment we can not verify all the details of the proof of estimate $\tilde{O}(n^2/\varepsilon )$. Also the proposed in [7, 47] methods are mainly theoretical, like Lee–Sidford’s method for OT problem with the complexity $\tilde{O}(n^{2.5})$ [35]. For the moment it is hardly possible to implement these methods such that theirs practical efficiencies correspond to the theoretical ones.
6.
The code is available at https://github.com/dmivilensky/Proximal-Sinkhorn-algorithm.
7.
Figures 5–8 are given in the more complete version of the text by link https://arxiv.org/abs/1902.09001
8.
Figures 5–8 are given in the more complete version of the text by link https://arxiv.org/abs/1902.09001

References

Altschuler, J., Bach, F., Rudi, A., Weed, J.: Approximating the quadratic transportation metric in near-linear time. arXiv:1810.10046 (2018)
Altschuler, J., Weed, J., Rigollet, P.: Near-linear time approxfimation algorithms for optimal transport via sinkhorn iteration. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 1961–1971. Curran Associates, Inc. (2017)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 (2017)
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization (lecture notes). Personal web-page of A. Nemirovski (2015). http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf
Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)
Article MathSciNet Google Scholar
Bigot, J., Klein, T., et al.: Consistent estimation of a population barycenter in the Wasserstein space. arXiv:1212.2562 (2012)
Blanchet, J., Jambulapati, A., Kent, C., Sidford, A.: Towards optimal running times for optimal transport. arXiv:1810.07717 (2018)
Bogolubsky, L., et al.: Learning supervised PageRank with gradient-based and gradient-free optimization methods. In: NIPS 2016 (2016). http://papers.nips.cc/paper/6565-learning-supervised-pagerank-with-gradient-based-and-gradient-free-optimization-methods.pdf
Cartis, C., Gould, N.I.M., Toint, P.L.: Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. arXiv:1708.04044 (2018)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Article MathSciNet Google Scholar
Cohen, M.B., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. arXiv:1805.12591 (2018)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Inc. (2013)
Google Scholar
Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, Bejing, China, 22–24 June 2014, pp. 685–693. PMLR (2014). http://proceedings.mlr.press/v32/cuturi14.html
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008). https://doi.org/10.1137/060676386
Article MathSciNet MATH Google Scholar
Del Barrio, E., Lescornel, H., Loubes, J.M.: A statistical analysis of a deformation model with Wasserstein barycenters: estimation procedure and goodness of fit test. arXiv:1508.06465 (2015)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5
Article MathSciNet MATH Google Scholar
Devolder, O., Glineur, F., Nesterov, Y., et al.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016 (2013)
Google Scholar
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using taylor-like models: error bounds, convergence, and termination criteria. arXiv:1610.03446 (2016)
Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization. arXiv:1703.09180 (2017)
Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10783–10793. NeurIPS 2018, Curran Associates, Inc. (2018). arXiv:1802.04367
Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016). https://doi.org/10.1007/s10957-016-0999-6
Article MathSciNet MATH Google Scholar
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated directional derivative method for smooth stochastic convex optimization. arXiv:1804.02394 (2018)
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated method for derivative-free smooth stochastic convex optimization. arXiv:1802.09022 (2018)
Dvurechensky, P., Gasnikov, A., Kamzolov, D.: Universal intermediate gradient method for convex problems with inexact oracle. arXiv:1712.06036 (2017)
Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367
Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Generalized Mirror Prox: Solving variational inequalities with monotone operator, inexact oracle, and unknown Hölder parameters (2018). https://arxiv.org/abs/1806.05140
Dvurechensky, P., Gasnikov, A., Tiurin, A.: Randomized similar triangles method: a unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method) (2017). https://arxiv.org/abs/1707.08486
Ebert, J., Spokoiny, V., Suvorikova, A.: Construction of non-asymptotic confidence sets in 2-Wasserstein space (2017). https://arxiv.org/abs/1703.03658
Gasnikov, A.: Universal gradient descent (2017). https://arxiv.org/abs/1711.00394
Gasnikov, A., et al.: Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems (2015). https://arxiv.org/abs/1506.00292
Kantorovich, L.: On the translocation of masses. Doklady Acad. Sci. USSR (N.S.) 37(7–8), 227–229 (1942)
MathSciNet Google Scholar
Kroshnin, A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Tupitsa, N., Uribe, C.: On the complexity of approximating Wasserstein barycenter (2019). https://arxiv.org/abs/1901.08686
Kroshnin, A., Spokoiny, V., Suvorikova, A.: Statistical inference for bures-Wasserstein barycenters (2019). https://arxiv.org/abs/1901.00226
Le Gouic, T., Loubes, J.M.: Existence and consistency of Wasserstein barycenters. Probab. Theory Relat. Fields 168(3–4), 901–917 (2017)
MathSciNet MATH Google Scholar
Lee, Y.T., Sidford, A.: Path finding methods for linear programming: solving linear programs in o (vrank) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science Foundations of Computer Science (FOCS), pp. 424–433 (2014)
Google Scholar
Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)
Article MathSciNet Google Scholar
Mairal, J.: Optimization with first-order surrogate functions. In: International Conference on Machine Learning, pp. 783–791 (2013)
Google Scholar
Monge, G.: Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris (1781)
Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Massachusetts (2004)
Book Google Scholar
Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. CORE Discussion Papers 2018005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), March 2018. https://ideas.repec.org/p/cor/louvco/2018005.html
Nesterov, Y.: Soft clustering by convex electoral model. CORE Discussion Papers 2018001, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), January 2018. https://ideas.repec.org/p/cor/louvco/2018001.html
Nesterov, Y., Polyak, B.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet Google Scholar
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181(1), 244–278 (2019)
Article MathSciNet Google Scholar
Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 460–467 (2009)
Google Scholar
Peyré, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6), 355–607 (2019)
Article Google Scholar
Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)
MATH Google Scholar
Quanrud, K.: Approximating optimal transport with linear programs. In: 2nd Symposium on Simplicity in Algorithms (SOSA 2019), vol. 69, pp. 6:1–6:9. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018)
Google Scholar
Schmitzer, B.: Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems (2016). https://arxiv.org/abs/1610.06519
Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. II. Proc. Amer. Math. Soc. 45(2), 195–198 (1974)
Article MathSciNet Google Scholar
Solomon, J., Rustamov, R.M., Guibas, L., Butscher, A.: wasserstein propagation for semi-supervised learning. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. 306–314. PMLR (2014)
Google Scholar
Stonyakin, F., et al.: Gradient methods for problems with inexact model of the objective. arXiv:1902.09001 (2019)
Stonyakin, F., et al.: Inexact Model: A Framework for Optimization and Variational Inequalities (2019). https://arxiv.org/abs/1902.00990
Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)
Article MathSciNet Google Scholar
Tyurin, A., Gasnikov, A.: Fast gradient descent method for convex optimization problems with an oracle that generates a $(\delta , {L}) $-model of a function in a requested point. Comput. Math. Math. Phys. (2019, accepted). https://arxiv.org/abs/1711.02747
Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6544–6549 (2018)
Google Scholar

Download references

Acknowledgments

The work in Sects. 4 and 5 was funded by Russian Science Foundation (project 18-71-10108). The work in Subsect. 2.1 and Sect. 3 was supported by Russian Foundation for Basic Research 18-31-20005 mol$\_$a$\_$ved. The work of F. Stonyakin on Algorithm 2 and Theorem 2 was supported by Russian Science Foundation (project 18-71-00048). The work of A. Gasnikov in Sect. 2 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of A. Kroshnin in Sect. 3 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of S. Artamonov in Sect. 3 was supported by Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2019–2020 (grant No 19-01-024) and by the Russian Academic Excellence Project “5-100”.

Author information

Authors and Affiliations

V.I. Vernadsky Crimean Federal University, Simferopol, Russia
Fedor S. Stonyakin
Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany
Darina Dvinskikh & Pavel Dvurechensky
Institute for Information Transmission Problems RAS, Moscow, Russia
Darina Dvinskikh, Pavel Dvurechensky, Alexey Kroshnin & Alexander Gasnikov
Moscow Institute of Physics and Technologies, Moscow, Russia
Fedor S. Stonyakin, Alexey Kroshnin, Olesya Kuznetsova, Artem Agafonov & Alexander Gasnikov
National Research University Higher School of Economics, Moscow, Russia
Alexander Gasnikov, Alexander Tyurin & Sergei Artamonov
Massachusetts Institute of Technology, Cambridge, USA
César A. Uribe
239-th School of St. Petersburg, Saint Petersburg, Russia
Dmitry Pasechnyuk

Authors

Fedor S. Stonyakin
View author publications
You can also search for this author in PubMed Google Scholar
Darina Dvinskikh
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Dvurechensky
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Kroshnin
View author publications
You can also search for this author in PubMed Google Scholar
Olesya Kuznetsova
View author publications
You can also search for this author in PubMed Google Scholar
Artem Agafonov
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Tyurin
View author publications
You can also search for this author in PubMed Google Scholar
César A. Uribe
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Pasechnyuk
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Artamonov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fedor S. Stonyakin .

Editor information

Editors and Affiliations

Krasovsky Institute of Mathematics and Mechanics, Ekaterinburg, Russia
Michael Khachay
Sobolev Institute of Mathematics, Novosibirsk, Russia
Yury Kochetov
University of Florida, Gainesville, FL, USA
Panos Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stonyakin, F.S. et al. (2019). Gradient Methods for Problems with Inexact Model of the Objective. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2019. Lecture Notes in Computer Science(), vol 11548. Springer, Cham. https://doi.org/10.1007/978-3-030-22629-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-22629-9_8
Published: 12 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22628-2
Online ISBN: 978-3-030-22629-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Gradient Methods for Problems with Inexact Model of the Objective

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Riemannian Optimization via Frank-Wolfe Methods

Fast Primal-Dual Gradient Method for Strongly Convex Minimization Problems with Linear Constraints

Log-domain interior-point methods for convex quadratic programming

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Gradient Methods for Problems with Inexact Model of the Objective

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Riemannian Optimization via Frank-Wolfe Methods

Fast Primal-Dual Gradient Method for Strongly Convex Minimization Problems with Linear Constraints

Log-domain interior-point methods for convex quadratic programming

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation