Abstract
We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [16] and relative smoothness condition [36]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [41]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Here and below for all (large) n: \(\widetilde{O}(g(n)) \le \tilde{C}\cdot (\ln n)^r g(n)\) with some constants \(\tilde{C} > 0\) and \(r \ge 0\). Typically, \(r = 1\), but not in this particular case. If \(r=0\), then \(\widetilde{O}(\cdot ) = O(\cdot )\).
- 2.
One can find the proof in Appendix E of the full version of the paper [51].
- 3.
This bound is rough and typically \(\bar{c}_k\) is smaller in practice. By proper rounding of \(\pi ^k\) one can guarantee (without loss of generality) that \(\pi ^k_{ij}\ge \varepsilon /(2 n^2 \left||C\right||_\infty )\), which gives
$$ \frac{\bar{c}_k}{L} = \frac{\left||C\right||_\infty }{L} + \ln \left( \frac{2 n^2 \left||C\right||_\infty }{\varepsilon }\right) . $$But, in practice there often is no need to make ‘rounding’ after each outer iteration.
- 4.
- 5.
Strictly speaking for the moment we can not verify all the details of the proof of estimate \(\tilde{O}(n^2/\varepsilon )\). Also the proposed in [7, 47] methods are mainly theoretical, like Lee–Sidford’s method for OT problem with the complexity \(\tilde{O}(n^{2.5})\) [35]. For the moment it is hardly possible to implement these methods such that theirs practical efficiencies correspond to the theoretical ones.
- 6.
The code is available at https://github.com/dmivilensky/Proximal-Sinkhorn-algorithm.
- 7.
Figures 5–8 are given in the more complete version of the text by link https://arxiv.org/abs/1902.09001
- 8.
Figures 5–8 are given in the more complete version of the text by link https://arxiv.org/abs/1902.09001
References
Altschuler, J., Bach, F., Rudi, A., Weed, J.: Approximating the quadratic transportation metric in near-linear time. arXiv:1810.10046 (2018)
Altschuler, J., Weed, J., Rigollet, P.: Near-linear time approxfimation algorithms for optimal transport via sinkhorn iteration. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 1961–1971. Curran Associates, Inc. (2017)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 (2017)
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization (lecture notes). Personal web-page of A. Nemirovski (2015). http://www2.isye.gatech.edu/~nemirovs/Lect_ModConvOpt.pdf
Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)
Bigot, J., Klein, T., et al.: Consistent estimation of a population barycenter in the Wasserstein space. arXiv:1212.2562 (2012)
Blanchet, J., Jambulapati, A., Kent, C., Sidford, A.: Towards optimal running times for optimal transport. arXiv:1810.07717 (2018)
Bogolubsky, L., et al.: Learning supervised PageRank with gradient-based and gradient-free optimization methods. In: NIPS 2016 (2016). http://papers.nips.cc/paper/6565-learning-supervised-pagerank-with-gradient-based-and-gradient-free-optimization-methods.pdf
Cartis, C., Gould, N.I.M., Toint, P.L.: Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. arXiv:1708.04044 (2018)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Cohen, M.B., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. arXiv:1805.12591 (2018)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300. Curran Associates, Inc. (2013)
Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, Bejing, China, 22–24 June 2014, pp. 685–693. PMLR (2014). http://proceedings.mlr.press/v32/cuturi14.html
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008). https://doi.org/10.1137/060676386
Del Barrio, E., Lescornel, H., Loubes, J.M.: A statistical analysis of a deformation model with Wasserstein barycenters: estimation procedure and goodness of fit test. arXiv:1508.06465 (2015)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014). https://doi.org/10.1007/s10107-013-0677-5
Devolder, O., Glineur, F., Nesterov, Y., et al.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016 (2013)
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using taylor-like models: error bounds, convergence, and termination criteria. arXiv:1610.03446 (2016)
Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization. arXiv:1703.09180 (2017)
Dvurechensky, P., Dvinskikh, D., Gasnikov, A., Uribe, C.A., Nedić, A.: Decentralize and randomize: faster algorithm for Wasserstein barycenters. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 10783–10793. NeurIPS 2018, Curran Associates, Inc. (2018). arXiv:1802.04367
Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016). https://doi.org/10.1007/s10957-016-0999-6
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated directional derivative method for smooth stochastic convex optimization. arXiv:1804.02394 (2018)
Dvurechensky, P., Gasnikov, A., Gorbunov, E.: An accelerated method for derivative-free smooth stochastic convex optimization. arXiv:1802.09022 (2018)
Dvurechensky, P., Gasnikov, A., Kamzolov, D.: Universal intermediate gradient method for convex problems with inexact oracle. arXiv:1712.06036 (2017)
Dvurechensky, P., Gasnikov, A., Kroshnin, A.: Computational optimal transport: complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1367–1376 (2018). arXiv:1802.04367
Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Generalized Mirror Prox: Solving variational inequalities with monotone operator, inexact oracle, and unknown Hölder parameters (2018). https://arxiv.org/abs/1806.05140
Dvurechensky, P., Gasnikov, A., Tiurin, A.: Randomized similar triangles method: a unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method) (2017). https://arxiv.org/abs/1707.08486
Ebert, J., Spokoiny, V., Suvorikova, A.: Construction of non-asymptotic confidence sets in 2-Wasserstein space (2017). https://arxiv.org/abs/1703.03658
Gasnikov, A.: Universal gradient descent (2017). https://arxiv.org/abs/1711.00394
Gasnikov, A., et al.: Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems (2015). https://arxiv.org/abs/1506.00292
Kantorovich, L.: On the translocation of masses. Doklady Acad. Sci. USSR (N.S.) 37(7–8), 227–229 (1942)
Kroshnin, A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Tupitsa, N., Uribe, C.: On the complexity of approximating Wasserstein barycenter (2019). https://arxiv.org/abs/1901.08686
Kroshnin, A., Spokoiny, V., Suvorikova, A.: Statistical inference for bures-Wasserstein barycenters (2019). https://arxiv.org/abs/1901.00226
Le Gouic, T., Loubes, J.M.: Existence and consistency of Wasserstein barycenters. Probab. Theory Relat. Fields 168(3–4), 901–917 (2017)
Lee, Y.T., Sidford, A.: Path finding methods for linear programming: solving linear programs in o (vrank) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science Foundations of Computer Science (FOCS), pp. 424–433 (2014)
Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)
Mairal, J.: Optimization with first-order surrogate functions. In: International Conference on Machine Learning, pp. 783–791 (2013)
Monge, G.: Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris (1781)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Massachusetts (2004)
Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. CORE Discussion Papers 2018005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), March 2018. https://ideas.repec.org/p/cor/louvco/2018005.html
Nesterov, Y.: Soft clustering by convex electoral model. CORE Discussion Papers 2018001, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), January 2018. https://ideas.repec.org/p/cor/louvco/2018001.html
Nesterov, Y., Polyak, B.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181(1), 244–278 (2019)
Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 460–467 (2009)
Peyré, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6), 355–607 (2019)
Polyak, B.: Introduction to Optimization. Optimization Software, New York (1987)
Quanrud, K.: Approximating optimal transport with linear programs. In: 2nd Symposium on Simplicity in Algorithms (SOSA 2019), vol. 69, pp. 6:1–6:9. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018)
Schmitzer, B.: Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems (2016). https://arxiv.org/abs/1610.06519
Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. II. Proc. Amer. Math. Soc. 45(2), 195–198 (1974)
Solomon, J., Rustamov, R.M., Guibas, L., Butscher, A.: wasserstein propagation for semi-supervised learning. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp. 306–314. PMLR (2014)
Stonyakin, F., et al.: Gradient methods for problems with inexact model of the objective. arXiv:1902.09001 (2019)
Stonyakin, F., et al.: Inexact Model: A Framework for Optimization and Variational Inequalities (2019). https://arxiv.org/abs/1902.00990
Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016)
Tyurin, A., Gasnikov, A.: Fast gradient descent method for convex optimization problems with an oracle that generates a \((\delta , {L}) \)-model of a function in a requested point. Comput. Math. Math. Phys. (2019, accepted). https://arxiv.org/abs/1711.02747
Uribe, C.A., Dvinskikh, D., Dvurechensky, P., Gasnikov, A., Nedić, A.: Distributed computation of Wasserstein barycenters over networks. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6544–6549 (2018)
Acknowledgments
The work in Sects. 4 and 5 was funded by Russian Science Foundation (project 18-71-10108). The work in Subsect. 2.1 and Sect. 3 was supported by Russian Foundation for Basic Research 18-31-20005 mol\(\_\)a\(\_\)ved. The work of F. Stonyakin on Algorithm 2 and Theorem 2 was supported by Russian Science Foundation (project 18-71-00048). The work of A. Gasnikov in Sect. 2 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of A. Kroshnin in Sect. 3 was supported within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project “5-100”. The work of S. Artamonov in Sect. 3 was supported by Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2019–2020 (grant No 19-01-024) and by the Russian Academic Excellence Project “5-100”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Stonyakin, F.S. et al. (2019). Gradient Methods for Problems with Inexact Model of the Objective. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds) Mathematical Optimization Theory and Operations Research. MOTOR 2019. Lecture Notes in Computer Science(), vol 11548. Springer, Cham. https://doi.org/10.1007/978-3-030-22629-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-22629-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22628-2
Online ISBN: 978-3-030-22629-9
eBook Packages: Computer ScienceComputer Science (R0)