Abstract
We introduce an inexact oracle model for variational inequalities with monotone operators, propose a numerical method that solves such variational inequalities, and analyze its convergence rate. As a particular case, we consider variational inequalities with Hölder-continuous operator and show that our algorithm is universal. This means that, without knowing the Hölder exponent and Hölder constant, the algorithm has the least possible in the worst-case sense complexity for this class of variational inequalities. We also consider the case of variational inequalities with a strongly monotone operator and generalize the algorithm for variational inequalities with inexact oracle and our universal method for this class of problems. Finally, we show how our method can be applied to convex–concave saddle point problems with Hölder-continuous partial subgradients.
Similar content being viewed by others
References
Alkousa, M.S., Gasnikov, A.V., Dvinskikh, D.M., Kovalev, D.A., Stonyakin, F.S.: Accelerated methods for saddle-point problem. Comput. Math. Math. Phys. 60, 1787–1809 (2020)
Antonakopoulos, K., Belmega, V., Mertikopoulos, P.: Adaptive extra-gradient methods for min-max optimization and games. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=R0a0kFI3dJx
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 (2017)
Auslender, A., Teboulle, M.: Interior projection-like methods for monotone variational inequalities. Math. Program. 104(1), 39–68 (2005)
Aybat, N.S., Fallah, A., Gurbuzbalaban, M., Ozdaglar, A.: Robust accelerated gradient methods for smooth strongly convex functions. SIAM J. Optim. 30(1), 717–751 (2020)
Bach, F., Levy, K.Y.: A universal algorithm for variational inequalities adaptive to smoothness and noise. In: Beygelzimer, A., Hsu, D. (eds.) Proceedings of the 32nd Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 99, pp. 164–194. PMLR, Phoenix (2019). http://proceedings.mlr.press/v99/bach19a.html. ArXiv:1902.01637
Baimurzina, D.R., Gasnikov, A.V., Gasnikova, E.V., Dvurechensky, P.E., Ershov, E.I., Kubentaeva, M.B., Lagunovskaya, A.A.: Universal method of searching for equilibria and stochastic equilibria in transportation networks. Comput. Math. Math. Phys. 59(1), 19–33 (2019)
Bayandina, A., Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Mirror descent and convex optimization problems with non-smooth inequality constraints. In: Giselsson, P., Rantzer, A. (eds.) Large-Scale and Distributed Optimization, Chap. 8, pp. 181–215. Springer (2018)
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization (Lecture Notes). Personal web-page of A. Nemirovski (2015). https://www2.isye.gatech.edu/~nemirovs/LMCO_LN.pdf
Beznosikov, A., Dvurechensky, P., Koloskova, A., Samokhin, V., Stich, S.U., Gasnikov, A.: Decentralized local stochastic extra-gradient for variational inequalities. arXiv:2106.08315 (2021)
Bogolubsky, L., Dvurechensky, P., Gasnikov, A., Gusev, G., Nesterov, Y., Raigorodskii, A.M., Tikhonov, A., Zhukovskii, M.: Learning supervised pagerank with gradient-based and gradient-free optimization methods. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4914–4922. Curran Associates, Inc. (2016). ArXiv:1603.00717
Bullins, B., Lai, K.A.: Higher-order methods for convex-concave min-max optimization and monotone variational inequalities. arXiv:2007.04528 (2020)
Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 1019–1028. PMLR, Stockholmsmässan, Stockholm (2018). ArXiv:1805.12591
Dang, C.D., Lan, G.: On the convergence properties of non-Euclidean extragradient methods for variational inequalities with generalized monotone operators. Comput. Optim. Appl. 60(2), 277–310 (2015)
d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Dvinskikh, D., Gasnikov, A.: Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems. J. Inverse Ill-posed Probl. 29(3), 385–405 (2021). https://doi.org/10.1515/jiip-2020-0068
Dvinskikh, D., Ogaltsov, A., Gasnikov, A., Dvurechensky, P., Spokoiny, V.: On the line-search gradient methods for stochastic optimization. IFAC-PapersOnLine 53(2), 1715–1720 (2020). https://doi.org/10.1016/j.ifacol.2020.12.2284. 21th IFAC World Congress arXiv:1911.08380
Dvurechensky, P.: Gradient method with inexact oracle for composite non-convex optimization. Comput. Res. Model. 14(2), 321–334 (2022). https://doi.org/10.20537/2076-7633-2022-14-2-321-334
Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016)
Dvurechensky, P., Gorbunov, E., Gasnikov, A.: An accelerated directional derivative method for smooth stochastic convex optimization. Eur. J. Oper. Res. 290(2), 601–621 (2021) https://doi.org/10.1016/j.ejor.2020.08.027
Dvurechensky, P., Nesterov, Y., Spokoiny, V.: Primal-dual methods for solving infinite-dimensional games. J. Optim. Theory Appl. 166(1), 23–51 (2015)
Dvurechensky, P.E., Ivanov, G.E.: Algorithms for computing Minkowski operators and their application in differential games. Comput. Math. Math. Phys. 54(2), 235–264 (2014)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer (2007)
Gasnikov, A., Dvurechensky, P., Gorbunov, E., Vorontsova, E., Selikhanovych, D., Uribe, C.A., Jiang, B., Wang, H., Zhang, S., Bubeck, S., Jiang, Q., Lee, Y.T., Li, Y., Sidford, A.: Near optimal methods for minimizing convex functions with lipschitz \(p\)-th derivatives. In: Beygelzimer, A., Hsu, D. (eds.) Proceedings of the 32nd Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 99, pp. 1392–1393. PMLR, Phoenix (2019)
Gasnikov, A.V., Dvinskikh, D.M., Dvurechensky, P.E., Kamzolov, D.I., Matyukhin, V.V., Pasechnyuk, D.A., Tupitsa, N.K., Chernov, A.V.: Accelerated meta-algorithm for convex optimization problems. Comput. Math. Math. Phys. 61(1), 17–28 (2021). https://doi.org/10.1134/S096554252101005X
Gasnikov, A.V., Dvurechensky, P.E.: Stochastic intermediate gradient method for convex optimization problems. Dokl. Math. 93(2), 148–151 (2016)
Gasnikov, A.V., Dvurechensky, P.E., Stonyakin, F.S., Titov, A.A.: An adaptive proximal method for variational inequalities. Comput. Math. Math. Phys. 59(5), 836–841 (2019)
Gasnikov, A.V., Nesterov, Y.E.: Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 58(1), 48–64 (2018)
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)
Giannessi, F.: On Minty variational principle. New Trends in Mathematical Programming. Appl. Optim. 13, 93–99 (1997)
Gladin, E., Sadiev, A., Gasnikov, A., Dvurechensky, P., Beznosikov, A., Alkousa, M.: Solving smooth min-min and min-max problems by mixed oracle algorithms. In: Strekalovsky, A., Kochetov, Y., Gruzdeva, T., Orlov A. (eds.) Mathematical Optimization Theory and Operations Research: Recent Trends, pp. 19–40. Springer, Cham (2021). ArXiv:2103.00434
Gorbunov, E., Danilova, M., Shibaev, I., Dvurechensky, P., Gasnikov, A.: Near-optimal high probability complexity bounds for non-smooth stochastic optimization with heavy-tailed noise. arXiv:2106.05958 (2021)
Gorbunov, E., Dvurechensky, P., Gasnikov, A.: An accelerated method for derivative-free smooth stochastic convex optimization. SIAM J. Optim. 32(2), 1210–1238 (2022). https://doi.org/10.1137/19M1259225
Guminov, S., Gasnikov, A., Anikin, A., Gornov, A.: A universal modification of the linear coupling method. Optim. Methods Softw. 34(3), 560–577 (2019)
Guminov, S.V., Nesterov, Y.E., Dvurechensky, P.E., Gasnikov, A.V.: Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems. Dokl. Math. 99(2), 125–128 (2019)
Harker, P.T., Pang, J.S.: Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Program. 48(1–3), 161–220 (1990)
Juditsky, A., Nemirovski, A.: First order methods for nonsmooth convex large scale optimization, I: general purpose methods. Optim. Mach. Learn. 30(9), 121–148 (2011)
Kamzolov, D., Dvurechensky, P., Gasnikov, A.: Universal intermediate gradient method for convex problems with inexact oracle. Optim. Methods Softw. 36(6), 1289–1316 (2021). https://doi.org/10.1080/10556788.2019.1711079
Khanh, P.D., Vuong, P.T.: Modified projection method for strongly pseudomonotone variational inequalities. J. Glob. Optim. 58(2), 341–350 (2014)
Kniaz, V.V., Knyaz, V.A., Mizginov, V., Papazyan, A., Fomin, N., Grodzitsky, L.: Adversarial dataset augmentation using reinforcement learning and 3d modeling. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research IV, pp. 316–329. Springer, Cham (2021)
Korpelevich, G.: The extragradient method for finding saddle points and other problems. Eknom. i Matemat. Metody 12, 747–756 (1976)
Koshal, J., Nedić, A., Shanbhag, U.: Multiuser optimization: Distributed algorithms and error analysis. SIAM J. Optim. 21(3), 1046–1081 (2011)
Monteiro, R.D., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)
Nemirovski, A.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)
Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3), 319–344 (2007). First appeared in 2003 as CORE discussion paper 2003/68
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1), 381–404 (2015)
Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. Math. Program. 186(1), 157–183 (2021). https://doi.org/10.1007/s10107-019-01449-1
Nesterov, Y., Gasnikov, A., Guminov, S., Dvurechensky, P.: Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw. 36(4), 1–28 (2020)
Nesterov, Y., Scrimali, L.: Solving strongly monotone variational and quasi-variational inequalities. Discrete Contin. Dyn. Syst.—A 31(4), 1383–1396 (2011)
Ostroukhov, P., Kamalov, R., Dvurechensky, P., Gasnikov, A.: Tensor methods for strongly convex strongly concave saddle point problems and strongly monotone variational inequalities. arXiv:2012.15595 (2020)
Ouyang, Y., Xu, Y.: Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Math. Program. 185(1), 1–35 (2021). https://doi.org/10.1007/s10107-019-01420-0
Polyak, B.: A general method of solving extremum problems. Sov. Math. Dokl. 8(3), 593–597 (1967)
Rogozin, A., Beznosikov, A., Dvinskikh, D., Kovalev, D., Dvurechensky, P., Gasnikov, A.: Decentralized distributed optimization for saddle point problems. arXiv:2102.07758 (2021)
Sadiev, A., Beznosikov, A., Dvurechensky, P., Gasnikov, A.: Zeroth-order algorithms for smooth saddle-point problems. In: Strekalovsky, A., Kochetov, Y., Gruzdeva, T., Orlov, A. (eds.) Mathematical Optimization Theory and Operations Research: Recent Trends, pp. 71–85. Springer, Cham (2021). ArXiv:2009.09908
Shibaev, I., Dvurechensky, P., Gasnikov, A.: Zeroth-order methods for noisy Hölder-gradient functions. Optim. Lett. (2021). https://doi.org/10.1007/s11590-021-01742-z
Solodov, M., Svaiter, B.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Anal. 7(4), 323–345 (1999)
Stonyakin, F., Tyurin, A., Gasnikov, A., Dvurechensky, P., Agafonov, A., Dvinskikh, D., Alkousa, M., Pasechnyuk, D., Artamonov, S., Piskunova, V.: Inexact model: a framework for optimization and variational inequalities. Optim. Methods Softw. 36(6), 1155–1201 (2021). https://doi.org/10.1080/10556788.2021.1924714
Stonyakin, F.S., Dvinskikh, D., Dvurechensky, P., Kroshnin, A., Kuznetsova, O., Agafonov, A., Gasnikov, A., Tyurin, A., Uribe, C.A., Pasechnyuk, D., Artamonov, S.: Gradient methods for problems with inexact model of the objective. In: Khachay, M., Kochetov, Y., Pardalos, P. (eds.) Mathematical Optimization Theory and Operations Research, pp. 97–114. Springer, Cham (2019). ArXiv:1902.09001
Tiapkin, D., Gasnikov, A., Dvurechensky, P.: Stochastic saddle-point optimization for the Wasserstein barycenter problem. Optim. Lett. (2022). https://doi.org/10.1007/s11590-021-01834-w
Titov, A., Stonyakin, F., Alkousa, M., Gasnikov, A.: Algorithms for solving variational inequalities and saddle point problems with some generalizations of Lipschitz property for operators. In: Strekalovsky, A., Kochetov, Y., Gruzdeva, T., Orlov, A. (eds.) Mathematical Optimization Theory and Operations Research, pp. 86–101. Springer, Cham (2021)
Tominin, V., Tominin, Y., Borodich, E., Kovalev, D., Gasnikov, A., Dvurechensky, P.: On accelerated methods for saddle-point problems with composite structure. arXiv:2103.09344 (2021)
Zhang, J., Hong, M., Zhang, S.: On lower iteration complexity bounds for the saddle point problems. Math. Program. 194, 901–935 (2022). https://doi.org/10.1007/s10107-021-01660-z
Acknowledgements
The authors are grateful to Yurii Nesterov for fruitful discussions. The research by A. Gasnikov in Section 3 was supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) No. 075-00337-20-03, project No. 0714-2020-0005. The research in Sect. 6 and Appendix B was supported by Russian Science Foundation (project 18-71-00048).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Nguyen Dong Yen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Proof of Lemma 3.1
Proof
Let us fix some \(\nu \in [0,1]\). Then, for any \(x \in [0,1 ]\), we have that \(x^{2\nu }\le 1\). On the other hand, for any \(x \ge 1\), we have that \(\ x^{2\nu }\le x^{2}\). Thus, for any \(x \ge 0\), we have that \(x^{2\nu }\le x^2+ 1\). Hence, for any \(\alpha , \beta \ge 0\),
Substituting \(\alpha =\frac{ba^{\frac{1}{1+\nu }}}{\delta ^{\frac{1}{1+\nu }}}\) and \(\beta =\frac{ca^{\frac{1}{1+\nu }}}{\delta ^{\frac{1}{1+\nu }}}\), we obtain
and
Appendix B
To show the practical performance of the proposed Algorithm 1, we performed a series of numerical experiments for the Lagrange saddle point problem induced by the Fermat-Torricelli-Steiner problem.
All experiments were made using Python 3.4, on a computer with Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1992 Mhz, 4 Core(s), 8 Logical Processor(s), and 8 GB RAM.
We consider an example of a variational inequality with a non-smooth, i.e., with \(\nu = 0\), and non-strongly monotone operator. For this VI, the proposed universal method, due to its adaptivity to the smoothness level of the problem, works in practice with iteration complexity much smaller than the one predicted by the theory. This example is inspired by the well-known Fermat–Torricelli–Steiner problem, to which we add some non-smooth functional constraints. This problem can be solved by a switching subgradient scheme [8, 55] with the complexity \(O(1/\varepsilon ^2)\). But, as we will see, our method allows us to obtain much faster convergence in practice than the one given by this bound.
More precisely, for a given set of N points \(A_k \in \mathbb {R}^n, k=1,\ldots , N\), consider the optimization problem
where Q is a convex and compact set, \(\alpha _{pi}\) are drawn from the standard normal distribution and then truncated to be positive. The corresponding Lagrange saddle point problem is defined as
As it was described in Sect. 6, this problem is equivalent to the variational inequality with the following monotone non-smooth operator:
For simplicity, we assume that there exists (potentially very large) bound on the norm of the optimal Lagrange multiplier \(\lambda ^*\), which allows us to compactify the feasible set for the pair \((x,\lambda )\) to be a Euclidean ball of some radius. We also believe that the approach of [14, 44] for dealing with unbounded feasible sets can be extended to our setting and we leave this for future work.
We run Algorithm 1 with different values of n, m, and N using the standard Euclidean prox-setup and the starting point \((x^0, \lambda ^0) = \frac{1}{\sqrt{m+n}} {\textbf {1}} \in \mathbb {R}^{n+m}\), where \({\textbf {1}} \) is the vector of all ones. The points \(A_k\), \(k=1,\ldots , N\) are drawn randomly from the standard normal distribution. For each value of the parameters, the random data was drawn 10 times and the results were averaged. The results of the work of Algorithm 1 are represented in Fig. 1. For different values of the accuracy \(\varepsilon \in \{ 1/2^i, i=1,2,3,4,5,6\}\), we report the number of iterations and the running time in seconds required by Algorithm 1 to reach an \(\varepsilon \)-solution of the considered problem.
As it is known [46], for a VI with a non-smooth operator, the theoretical iteration complexity estimate \(O\left( \frac{1}{\varepsilon ^2}\right) \) is optimal. However, experimentally we see from the slope of the lines in Fig. 1 that, due to the adaptivity, the proposed Algorithm 1 has iteration complexity \(O\left( \frac{1}{\root 4 \of {\varepsilon }}\right) \).
Appendix C
In this appendix, in order to demonstrate the performance of the Generalized Mirror Prox with restarts (Algorithm 2), we consider the variational inequality with Lipschitz-continuous and strongly monotone operator (see Example 5.2 in [40]):
We compare the work of the proposed Algorithm 2 with Modified Projection Method, which was proposed in [40]. We run Algorithm 2 with different values of the accuracy \(\varepsilon \in \{10^{-i}, i = 3,4,\ldots , 10 \}\) and for the dimension \( n = 10^7\). We take \(Q = \{x \in \mathbb {R}^n, \Vert x\Vert _2 \le 2\}\). The results of the comparison are presented in Fig. 2, which shows the norm \(\Vert x_{\text {out}} - x_*\Vert _2\) as a function of iteration number, where \(x_{\text {out}}\) is the output of each algorithm, and \(x_*\) is the solution of the problem (1) with the operator g given in (35). Note that \(x_* = {\textbf {0}} \in \mathbb {R}^n\). In the conducted experiments, we first run Algorithm 2 and calculate \(\Vert x_{\text {out}} - x_*\Vert _2\) for the different previously mentioned values of \(\varepsilon \) and the corresponding number of iterations required by the algorithm. For the calculated numbers of iterations of Algorithm 2, we rum Modified Projection Method and calculate the corresponding values \(\Vert x_{\text {out}} - x_*\Vert _2\). From Fig. 2, we can see the higher efficiency of the proposed Algorithm 2 and the significant difference between the performance of the compared algorithms.
Rights and permissions
About this article
Cite this article
Stonyakin, F., Gasnikov, A., Dvurechensky, P. et al. Generalized Mirror Prox Algorithm for Monotone Variational Inequalities: Universality and Inexact Oracle. J Optim Theory Appl 194, 988–1013 (2022). https://doi.org/10.1007/s10957-022-02062-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-022-02062-7