Skip to main content

Gradient Methods for Problems with Inexact Model of the Objective

  • Conference paper
  • First Online:
Mathematical Optimization Theory and Operations Research (MOTOR 2019)

Abstract

We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [16] and relative smoothness condition [36]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [41]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here and below for all (large) n: \(\widetilde{O}(g(n)) \le \tilde{C}\cdot (\ln n)^r g(n)\) with some constants \(\tilde{C} > 0\) and \(r \ge 0\). Typically, \(r = 1\), but not in this particular case. If \(r=0\), then \(\widetilde{O}(\cdot ) = O(\cdot )\).

  2. 2.

    One can find the proof in Appendix E of the full version of the paper [51].

  3. 3.

    This bound is rough and typically \(\bar{c}_k\) is smaller in practice. By proper rounding of \(\pi ^k\) one can guarantee (without loss of generality) that \(\pi ^k_{ij}\ge \varepsilon /(2 n^2 \left||C\right||_\infty )\), which gives

    $$ \frac{\bar{c}_k}{L} = \frac{\left||C\right||_\infty }{L} + \ln \left( \frac{2 n^2 \left||C\right||_\infty }{\varepsilon }\right) . $$

    But, in practice there often is no need to make ‘rounding’ after each outer iteration.

  4. 4.

    Our experiments on MNIST data set show (see Figs. 23) that in practice the bound is better.

  5. 5.

    Strictly speaking for the moment we can not verify all the details of the proof of estimate \(\tilde{O}(n^2/\varepsilon )\). Also the proposed in [7, 47] methods are mainly theoretical, like Lee–Sidford’s method for OT problem with the complexity \(\tilde{O}(n^{2.5})\) [35]. For the moment it is hardly possible to implement these methods such that theirs practical efficiencies correspond to the theoretical ones.

  6. 6.

    The code is available at https://github.com/dmivilensky/Proximal-Sinkhorn-algorithm.

  7. 7.

    Figures 5–8 are given in the more complete version of the text by link https://arxiv.org/abs/1902.09001

  8. 8.

    Figures 5–8 are given in the more complete version of the text by link https://arxiv.org/abs/1902.09001

References

  1. Altschuler, J., Bach, F., Rudi, A., Weed, J.: Approximating the quadratic transportation metric in near-linear time. arXiv:1810.10046 (2018)

  2. Altschuler, J., Weed, J., Rigollet, P.: Near-linear time approxfimation algorithms for optimal transport via sinkhorn iteration. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 1961–1971. Curran Associates, Inc. (2017)

    Google Scholar 

  3. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 (2017)

  4. Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization (lecture notes). Personal web-page of A. Nemirovski (2015). http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf

  5. Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)

    Article  MathSciNet  Google Scholar 

  6. Bigot, J., Klein, T., et al.: Consistent estimation of a population barycenter in the Wasserstein space. arXiv:1212.2562 (2012)

  7. Blanchet, J., Jambulapati, A., Kent, C., Sidford, A.: Towards optimal running times for optimal transport. arXiv:1810.07717 (2018)

  8. Bogolubsky, L., et al.: Learning supervised PageRank with gradient-based and gradient-free optimization methods. In: NIPS 2016 (2016). http://papers.nips.cc/paper/6565-learning-supervised-pagerank-with-gradient-based-and-gradient-free-optimization-methods.pdf

  9. Cartis, C., Gould, N.I.M., Toint, P.L.: Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. arXiv:1708.04044 (2018)

  10. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)

    Article  MathSciNet  Google Scholar 

  11. Cohen, M.B., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. arXiv:1805.12591 (2018)

  12. Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Inc. (2013)

    Google Scholar 

  13. Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, Bejing, China, 22–24 June 2014, pp. 685–693. PMLR (2014). http://proceedings.mlr.press/v32/cuturi14.html

  14. d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008). https://doi.org/10.1137/060676386

    Article  MathSciNet  MATH  Google Scholar 

  15. Del Barrio, E., Lescornel, H., Loubes, J.M.: A statistical analysis of a deformation model with Wasserstein barycenters: estimation procedure and goodness of fit test. arXiv:1508.06465 (2015)

  16. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5

    Article  MathSciNet  MATH  Google Scholar 

  17. Devolder, O., Glineur, F., Nesterov, Y., et al.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016 (2013)

    Google Scholar 

  18. Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using taylor-like models: error bounds, convergence, and termination criteria. arXiv:1610.03446 (2016)

  19. Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization. arXiv:1703.09180 (2017)

  20. Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10783–10793. NeurIPS 2018, Curran Associates, Inc. (2018). arXiv:1802.04367

  21. Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016). https://doi.org/10.1007/s10957-016-0999-6

    Article  MathSciNet  MATH  Google Scholar 

  22. Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated directional derivative method for smooth stochastic convex optimization. arXiv:1804.02394 (2018)

  23. Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated method for derivative-free smooth stochastic convex optimization. arXiv:1802.09022 (2018)

  24. Dvurechensky, P., Gasnikov, A., Kamzolov, D.: Universal intermediate gradient method for convex problems with inexact oracle. arXiv:1712.06036 (2017)

  25. Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367

  26. Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Generalized Mirror Prox: Solving variational inequalities with monotone operator, inexact oracle, and unknown Hölder parameters (2018). https://arxiv.org/abs/1806.05140

  27. Dvurechensky, P., Gasnikov, A., Tiurin, A.: Randomized similar triangles method: a unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method) (2017). https://arxiv.org/abs/1707.08486

  28. Ebert, J., Spokoiny, V., Suvorikova, A.: Construction of non-asymptotic confidence sets in 2-Wasserstein space (2017). https://arxiv.org/abs/1703.03658

  29. Gasnikov, A.: Universal gradient descent (2017). https://arxiv.org/abs/1711.00394

  30. Gasnikov, A., et al.: Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems (2015). https://arxiv.org/abs/1506.00292

  31. Kantorovich, L.: On the translocation of masses. Doklady Acad. Sci. USSR (N.S.) 37(7–8), 227–229 (1942)

    MathSciNet  Google Scholar 

  32. Kroshnin, A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Tupitsa, N., Uribe, C.: On the complexity of approximating Wasserstein barycenter (2019). https://arxiv.org/abs/1901.08686

  33. Kroshnin, A., Spokoiny, V., Suvorikova, A.: Statistical inference for bures-Wasserstein barycenters (2019). https://arxiv.org/abs/1901.00226

  34. Le Gouic, T., Loubes, J.M.: Existence and consistency of Wasserstein barycenters. Probab. Theory Relat. Fields 168(3–4), 901–917 (2017)

    MathSciNet  MATH  Google Scholar 

  35. Lee, Y.T., Sidford, A.: Path finding methods for linear programming: solving linear programs in o (vrank) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science Foundations of Computer Science (FOCS), pp. 424–433 (2014)

    Google Scholar 

  36. Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)

    Article  MathSciNet  Google Scholar 

  37. Mairal, J.: Optimization with first-order surrogate functions. In: International Conference on Machine Learning, pp. 783–791 (2013)

    Google Scholar 

  38. Monge, G.: Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris (1781)

    Google Scholar 

  39. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Massachusetts (2004)

    Book  Google Scholar 

  40. Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. CORE Discussion Papers 2018005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), March 2018. https://ideas.repec.org/p/cor/louvco/2018005.html

  41. Nesterov, Y.: Soft clustering by convex electoral model. CORE Discussion Papers 2018001, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), January 2018. https://ideas.repec.org/p/cor/louvco/2018001.html

  42. Nesterov, Y., Polyak, B.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  Google Scholar 

  43. Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181(1), 244–278 (2019)

    Article  MathSciNet  Google Scholar 

  44. Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 460–467 (2009)

    Google Scholar 

  45. Peyré, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6), 355–607 (2019)

    Article  Google Scholar 

  46. Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)

    MATH  Google Scholar 

  47. Quanrud, K.: Approximating optimal transport with linear programs. In: 2nd Symposium on Simplicity in Algorithms (SOSA 2019), vol. 69, pp. 6:1–6:9. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018)

    Google Scholar 

  48. Schmitzer, B.: Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems (2016). https://arxiv.org/abs/1610.06519

  49. Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. II. Proc. Amer. Math. Soc. 45(2), 195–198 (1974)

    Article  MathSciNet  Google Scholar 

  50. Solomon, J., Rustamov, R.M., Guibas, L., Butscher, A.: wasserstein propagation for semi-supervised learning. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. 306–314. PMLR (2014)

    Google Scholar 

  51. Stonyakin, F., et al.: Gradient methods for problems with inexact model of the objective. arXiv:1902.09001 (2019)

  52. Stonyakin, F., et al.: Inexact Model: A Framework for Optimization and Variational Inequalities (2019). https://arxiv.org/abs/1902.00990

  53. Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)

    Article  MathSciNet  Google Scholar 

  54. Tyurin, A., Gasnikov, A.: Fast gradient descent method for convex optimization problems with an oracle that generates a \((\delta , {L}) \)-model of a function in a requested point. Comput. Math. Math. Phys. (2019, accepted). https://arxiv.org/abs/1711.02747

  55. Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6544–6549 (2018)

    Google Scholar 

Download references

Acknowledgments

The work in Sects. 4 and 5 was funded by Russian Science Foundation (project 18-71-10108). The work in Subsect. 2.1 and Sect. 3 was supported by Russian Foundation for Basic Research 18-31-20005 mol\(\_\)a\(\_\)ved. The work of F.  Stonyakin on Algorithm 2 and Theorem 2 was supported by Russian Science Foundation (project 18-71-00048). The work of A. Gasnikov in Sect. 2 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of A. Kroshnin in Sect. 3 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of S. Artamonov in Sect. 3 was supported by Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2019–2020 (grant No 19-01-024) and by the Russian Academic Excellence Project “5-100”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fedor S. Stonyakin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stonyakin, F.S. et al. (2019). Gradient Methods for Problems with Inexact Model of the Objective. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2019. Lecture Notes in Computer Science(), vol 11548. Springer, Cham. https://doi.org/10.1007/978-3-030-22629-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22629-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22628-2

  • Online ISBN: 978-3-030-22629-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics