Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

Cox, Bruce; Juditsky, Anatoli; Nemirovski, Arkadi

doi:10.1007/s10957-016-0949-3

Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

Published: 13 June 2016

Volume 172, pages 402–435, (2017)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Bruce Cox¹,
Anatoli Juditsky² &
Arkadi Nemirovski³

552 Accesses
9 Citations
Explore all metrics

Abstract

The majority of first-order methods for large-scale convex–concave saddle point problems and variational inequalities with monotone operators are proximal algorithms. To make such an algorithm practical, the problem’s domain should be proximal-friendly—admit a strongly convex function with easy to minimize linear perturbations. As a by-product, this domain admits a computationally cheap linear minimization oracle (LMO) capable to minimize linear forms. There are, however, important situations where a cheap LMO indeed is available, but the problem domain is not proximal-friendly, which motivates search for algorithms based solely on LMO. For smooth convex minimization, there exists a classical algorithm using LMO—conditional gradient. In contrast, known to us similar techniques for other problems with convex structure (nonsmooth convex minimization, convex–concave saddle point problems, even as simple as bilinear ones, and variational inequalities with monotone operators, even as simple as affine) are quite recent and utilize common approach based on Fenchel-type representations of the associated objectives/vector fields. The goal of this paper was to develop alternative (and seemingly much simpler) decomposition techniques based on LMO for bilinear saddle point problems and for variational inequalities with affine monotone operators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithms for Solving Variational Inequalities and Saddle Point Problems with Some Generalizations of Lipschitz Property for Operators

Continuous Projection Generalized Extra-Gradient Quasi-Newton Second-Order Method for Solving Saddle Point Problems

Article 01 May 2022

Proximal Methods Avoid Active Strict Saddles of Weakly Convex Functions

Article 03 May 2021

Notes

In retrospect, a special case of this strategy was used in [22–24].
Note that the saddle point frontier depends on the order of blocks in the x- and the y-variables, and this order will always be clear from the context.
The construction to follow can be easily extended from “knapsack-generated” matrices to more general “Dynamic Programming-generated” ones, see Sect. 1 in the “Appendix.”
This story is a variation of what is called “Colonel Blotto Game” in Game Theory; see, e.g., [25, 26] and references therein.
For implementation details, see Sect. 1.
“a primal” instead of “the primal” reflects the fact that $\varPsi $ is not uniquely defined by $\varPhi $—it is defined by $\varPhi $ and $\overline{\eta }$ and by how the values of $\varPsi $ are selected when (32) does not specify these values uniquely.
“covers” instead of “is equivalent” stems from the fact that the scope of decomposition is not restricted to the setups of the form of (41).
Note that applying Carathéodory theorem, we could further “compress” the representations of approximate solutions—make these solutions convex combinations of at most $K+1$ of $\delta ^D_{d^i}$s and $\delta ^A_{a^i}$s.

References

Juditsky, A., Nemirovski, A.: Solving variational inequalities with monotone operators on domains given by linear minimization oracles. Math. Program. 152(1), 1–36 (2013)
MathSciNet MATH Google Scholar
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152(1), 75–112 (2014)
MathSciNet MATH Google Scholar
Ziegler, G.M.: Lectures on Polytopes, vol. 152. Springer, Berlin (1995)
MATH Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)
Article MathSciNet Google Scholar
Demyanov, V., Rubinov, A.: Approximate Methods in optimization problems, vol. 32. Elsevier, Amsterdam (1970)
Google Scholar
Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)
Article MathSciNet MATH Google Scholar
Freund, R.M., Grigas, P.: New analysis and results for the Frank–Wolfe method. Math. Program. 155(1–2), 199–230 (2016)
Garber, D., Hazan, E.: Faster Rates for the Frank–Wolfe Method Over Strongly-convex Sets. arXiv preprint arXiv:1406.1305 (2014)
Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3386–3393. IEEE (2012)
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 427–435 (2013)
Jaggi, M., Sulovsk, M., et al.: A simple algorithm for nuclear norm regularized problems. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 471–478 (2010)
Pshenichny, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. Mir Moscow (1978)
Argyriou, A., Signoretto, M., Suykens, J.A.K.: Hybrid algorithms with applications to sparse and low rank regularization, chap. 3. In: Suykens, J.A.K., Signoretto,M., Argyriou, A., (eds.) Regularization, Optimization, Kernels, and Support Vector Machines, pp. 53–82. Chapman & Hall/CRC (2014)
Pierucci, F., Harchaoui, Z., Malick, J.: A Smoothing Approach for Composite Conditional Gradient with Nonsmooth Loss. Tech. rep., Inria (2014). https://hal.inria.fr/hal-01096630/
Tewari, A., Ravikumar, P.K., Dhillon, I.S.: Greedy algorithms for structurally constrained high dimensional problems. In: Advances in Neural Information Processing Systems, pp. 882–890 (2011)
Ying, Y., Li, P.: Distance metric learning with eigenvalue optimization. J. Mach. Learn. Res. 13(1), 1–26 (2012)
MathSciNet MATH Google Scholar
Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148(1–2), 143–180 (2014)
Article MathSciNet MATH Google Scholar
Lan, G., Zhou, Y.: Conditional Gradient Sliding for Convex Optimization (2014). http://www.ise.ufl.edu/glan/files/2015/09/CGS08-31
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, vol. 305. Springer, Berlin (2013)
Nemirovski, A., Onn, S., Rothblum, U.G.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35(1), 52–78 (2010)
Article MathSciNet MATH Google Scholar
Cox, B.: Applications of Accuracy Certificates for Problems with Convex Structure. Ph.D. thesis, Georgia Institute of Technology (2011). https://smartech.gatech.edu/jspui/bitstream/1853/39489/1/cox_bruce_a_201105_phd
Gol’stein, E.: Direct-dual block method of linear programming. Autom. Remote Control 57(11), 1531–1536 (1996)
Google Scholar
Gol’stein, E., Sokolov, N.: A decomposition algorithm for solving multicommodity production-and-transportation problem. Ekonomika i Matematicheskie Metody 33(1), 112–128 (1997)
MATH Google Scholar
Dvurechensky, P., Nesterov, Y., Spokoiny, V.: Primal-dual methods for solving infinite-dimensional games. J. Optim. Theory Appl. 166(1), 23–51 (2015)
Article MathSciNet MATH Google Scholar
Bellman, R.: On “Colonel Blotto” and analogous games. SIAM Rev. 11(1), 66–68 (1969)
Article MathSciNet MATH Google Scholar
Robertson, B.: The Colonel Blotto game. Econ. Theory 29(1), 1–24 (2006)
Article MathSciNet MATH Google Scholar
Grant, M., Boyd, S.: Cvx: Matlab Software for Disciplined Convex Programming, version 2.1 (2015). http://cvxr.com/cvx

Download references

Acknowledgments

A. Juditsky was supported by the CNRS-Mastodons project Titan, and the LabEx PERSYVAL-Lab (ANR-11-LABX-0025). Research of A. Nemirovski was supported by the NSF Grants CMMI-1232623, CCF-1415498, CMMI-1262063.

Author information

Authors and Affiliations

US Air Force, Arlington, VA, USA
Bruce Cox
LJK, Université Grenoble Alpes, B.P. 53, 38041, Grenoble Cedex 9, France
Anatoli Juditsky
Georgia Institute of Technology, Atlanta, GA, 30332, USA
Arkadi Nemirovski

Authors

Bruce Cox
View author publications
You can also search for this author in PubMed Google Scholar
Anatoli Juditsky
View author publications
You can also search for this author in PubMed Google Scholar
Arkadi Nemirovski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anatoli Juditsky.

Appendix

Proof of Lemma 2.1

It suffices to prove the $\phi $-related statements. Lipschitz continuity of $\phi $ in the direct product case is evident. Furthermore, the function $\theta (x_1,x_2;y_1)=\max \limits _{y_2\in Y_2[y_1]}\varPhi (x_1,x_2;y_1,y_2)$ is convex and Lipschitz continuous in $x=[x_1;x_2]\in X$ for every $y_1\in Y_1$, whence

$$\begin{aligned} \phi (x_1,y_1)=\min \limits _{x_2\in X_2[x_1]}\theta (x_1,x_2;y_1) \end{aligned}$$

is convex and lower semicontinuous in $x_1\in X_1$ (note that X is compact). On the other hand,

$$\begin{aligned} \begin{array}{rcl} \phi (x_1,y_1)&{}=&{}\max \limits _{y_2\in Y_2[y_1]}\min \limits _{x_2\in X_2[x_1]}\varPhi (x_1,x_2;y_1,y_2)\\ &{}=&{}\max \limits _{y_2\in Y_2[y_1]} \left[ \chi (x_1;y_1,y_2):=\min \limits _{x_2\in X_2[x_1]}\varPhi (x_1,x_2;y_1,y_2)\right] , \end{array} \end{aligned}$$

so that $\chi (x_1;y_1,y_2)$ is concave and Lipschitz continuous in $y=[y_1;y_2]\in Y$ for every $x_1\in X_1$, whence

$$\begin{aligned} \phi (x_1,y_1)=\max \limits _{y_2\in Y_2[y_1]}\chi (x_1;y_1,y_2) \end{aligned}$$

is concave and upper semicontinuous in $y_1\in Y_1$ (note that Y is compact).

Next, we have

$$\begin{aligned} \begin{array}{l} {\mathrm{SadVal}}(\phi ,X_1,X_2)=\inf \limits _{x_1\in X_1}\left[ \sup \limits _{y_1\in Y_1}\left[ \sup \limits _{y_2:[y_1;y_2]\in Y}\inf \limits _{x_2: [x_1;x_2]\in X}\varPhi (x_1,x_2;y_1,y_2)\right] \right] \\ \quad = \inf \limits _{x_1\in X_1}\left[ \sup \limits _{[y_1;y_2]\in Y}\inf \limits _{x_2:[x_1;x_2]\in X}\varPhi (x_1,x_2;y_1,y_2)\right] \\ \quad =\inf \limits _{x_1\in X_1}\left[ \inf \limits _{x_2:[x_1;x_2]\in X}\sup \limits _{[y_1;y_2]\in Y}\varPhi (x_1,x_2;y_1,y_2)\right] \hbox { [by Sion-Kakutani Theorem [19]]}\\ \quad =\inf \limits _{[x_1;x_2]\in X}\sup \limits _{[y_1;y_2]\in Y}\varPhi (x_1,x_2;y_1,y_2)={\mathrm{SadVal}}(\varPhi ,X,Y),\\ \end{array} \end{aligned}$$

as required in (2). Finally, let $\bar{x}=[\bar{x}_1;\bar{x}_2]\in X$ and $\bar{y}=[\bar{y}_1;\bar{y}_2]\in Y$. We have

$$\begin{aligned} \begin{array}{rcl} \overline{\phi }(\bar{x}_1)-{\mathrm{SadVal}}(\phi ,X_1,Y_1)&{}=&{}\overline{\phi }(\bar{x}_1)-{\mathrm{SadVal}}(\varPhi ,X,Y)\hbox { [by (2)]}\\ &{}=&{}\sup \limits _{y_1\in Y_1}\phi (\bar{x}_1,y_1)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\sup \limits _{y_1\in Y_1}\sup \limits _{y_2:[y_1;y_2]\in Y}\inf \limits _{x_2:[\bar{x}_1;x_2]\in X}\varPhi (\bar{x}_1,x_2;y_1,y_2)\\ &{}&{}-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\sup \limits _{[y_1;y_2]\in Y}\inf \limits _{x_2:[\bar{x}_1;x_2]\in X}\varPhi (\bar{x}_1,x_2;y_1,y_2)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\inf \limits _{x_2:[\bar{x}_1;x_2]\in X}\sup \limits _{y=[y_1;y_2]\in Y}\varPhi (\bar{x}_1,x_2;y)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}\le &{}\sup \limits _{y=[y_1;y_2]\in Y}\varPhi (\bar{x}_1,\bar{x}_2;y)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\overline{\varPhi }(\bar{x})-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ \end{array} \end{aligned}$$

and

$$\begin{aligned} \begin{array}{rcl} {\mathrm{SadVal}}(\phi ,X_1,Y_1)-\underline{\phi }(\bar{y}_1)&{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)-\underline{\phi }(\bar{y}_1)\hbox { [by (2)]}\\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)-\inf \limits _{x_1\in X_1}\phi (x_1,\bar{y}_1)\\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}&{}-\inf \limits _{x_1\in X_1}\left[ \inf \limits _{x_2:[x_1;x_2]\in X}\sup \limits _{y_2:[\bar{y}_1;y_2]\in Y}\varPhi (x_1,x_2;\bar{y}_1,y_2)\right] \\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)\!-\!\inf \limits _{x=[x_1;x_2]\in X}\sup \limits _{y_2:[\bar{y}_1;y_2]\in Y}\varPhi (x;\bar{y}_1,y_2)\\ &{}\le &{} {\mathrm{SadVal}}(\varPhi ,X,Y)-\inf \limits _{x=[x_1;x_2]\in X}\varPhi (x;\bar{y}_1,\bar{y}_2)\\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)-\underline{\varPhi }(\bar{y}).\\ \end{array} \end{aligned}$$

We conclude that

$$\begin{aligned} \begin{array}{l} \epsilon _{{\tiny \mathrm sad}}([\bar{x}_1;\bar{y}_1]\big |\phi ,X_1,Y_1)=\left[ \overline{\phi }(\bar{x}_1)-{\mathrm{SadVal}}(\phi ,X_1,Y_1)\right] \\ \quad +\left[ {\mathrm{SadVal}}(\phi ,X_1,Y_1)-\underline{\phi }(\bar{y}_1)\right] \\ \le \left[ \overline{\varPhi }(\bar{x})-{\mathrm{SadVal}}(\varPhi ,X,Y)\right] +\left[ {\mathrm{SadVal}}(\varPhi ,X,Y)-\underline{\varPhi }(\bar{y})\right] \!=\! \epsilon _{{\tiny \mathrm sad}}([\bar{x};\bar{y}]\big |\varPhi ,X,Y),\\ \end{array} \end{aligned}$$

as claimed in (3). $\square $

Proof of Lemma 2.2

For $x_1\in X_1$, we have

$$\begin{aligned} \begin{array}{l} \phi (x_1;\bar{y}_1) =\min \limits _{x_2:[x_1;x_2]\in X}\max \limits _{y_2:[\bar{y}_1;y_2]\in Y}\varPhi (x_1,x_2;\bar{y}_1,y_2)\ge \min \limits _{x_2:[x_1;x_2]\in X}\varPhi (x_1,x_2;\bar{y}_1,\bar{y}_2)\\ \quad \ge \min \limits _{x_2:[x_1;x_2]\in X}\big [\underbrace{\varPhi (\bar{x};\bar{y})}_{\phi (\bar{x}_1;\bar{y}_1))}+\langle G,[x_1;x_2]-[\bar{x}_1;\bar{x}_2]\big ]\rangle \\ ~~~\, [\hbox {since }\varPhi (x;\bar{y})\hbox { is convex and } G\in \partial _x\varPhi (\bar{x};\bar{y})]\\ \ge \phi (\bar{x}_1;\bar{y}_1)+\langle g,x_1-\bar{x}_1\rangle \, [\hbox {by definition of }g,G],\\ \end{array} \end{aligned}$$

as claimed in (a). “Symmetric” reasoning justifies (b). $\square $

Proof of Lemma 2.3

Assume that (5) holds true. Then, G clearly is certifying, implying that

$$\begin{aligned} \chi _G(\bar{x}_1)=\langle G,[\bar{x}_1;\bar{x}_2]\rangle , \end{aligned}$$

and therefore (5) reads

$$\begin{aligned} \langle G,[x_1;x_2]\rangle \ge \chi _G(\bar{x}_1)+\langle g,x_1-\bar{x}_1\rangle \quad \forall x=[x_1;x_2]\in X, \end{aligned}$$

where taking minimum in the left-hand side over $x_2\in X_2[x_1]$,

$$\begin{aligned} \chi _G(x_1)\ge \chi _G(\bar{x}_1)+\langle g,x_1-\bar{x}_1\rangle \quad \forall x_1\in X_1, \end{aligned}$$

as claimed in (ii).

Now assume that (i) and (ii) hold true. By (i), $\chi _G(\bar{x}_1)=\langle G,[\bar{x}_1;\bar{x}_2]\rangle $, and by (ii) combined with the definition of $\chi _G$,

$$\begin{aligned} \forall x= & {} [x_1;x_2]\in X: \langle G,[x_1;x_2]\rangle \ge \chi _G(x_1)\ge \chi _G(\bar{x}_1)+\langle g,x_1-\bar{x}_1\rangle \\= & {} \langle G,\bar{x}\rangle + \langle g,x_1-\bar{x}_1\rangle , \end{aligned}$$

implying (5). $\square $

1.1 Dynamic Programming-Generated Simple Matrices

Consider the situation as follows. There exists an evolving in time system $\mathcal{S}$, with state $\xi _s$ at time $s=1,2,\ldots ,m$ belonging to a given finite nonempty set $\Xi _s$. Furthermore, every pair $(\xi ,s)$ with $s\in \{1,\ldots ,m\}$, $\xi \in \Xi _s$ is associated with nonempty finite set of actions $A^s_\xi $, and we set

$$\begin{aligned} \mathcal{S}_s=\{(\xi ,a):\xi \in \Xi _s,a\in A^s_\xi \}. \end{aligned}$$

Furthermore, for every s, $1\le s< m$, a transition mapping $\pi _{s}(\xi ,a):\mathcal{S}_s\rightarrow \Xi _{s+1}$ is given. Finally, we are given vector-valued functions (”outputs”) $\chi _s:\mathcal{S}_s\rightarrow {\mathbb {R}}^{r_s}$.

A trajectory of $\mathcal{S}$ is a sequence $\{(\xi _s,a_s):1\le s\le m\}$ such that $(\xi _s,a_s)\in \mathcal{S}_s$ for $1\le s\le m$ and

$$\begin{aligned} \xi _{s+1}=\pi _{s}(\xi _s,a_s),\,1\le s<m. \end{aligned}$$

The output of a trajectory $\tau =\{(\xi _s,a_s):1\le s\le m\}$ is the block vector

$$\begin{aligned} \chi [\tau ]=[\chi _1(\xi _1,a_1);\ldots ;\chi _m(\xi _m,a_m)]. \end{aligned}$$

We can associate with $\mathcal{S}$ the matrix $D=D[\mathcal{S}]$ with $K=r_1+\cdots +r_m$ rows and with columns indexed by the trajectories of $\mathcal{S}$; specifically, the column indexed by a trajectory $\tau $ is $\chi [\tau ]$.

For example, knapsack-generated matrix D associated with knapsack data from Sect. 2.6.2 is of the form $D[\mathcal{S}]$ with system $\mathcal{S}$ as follows:

$\Xi _s$, $s=1,\ldots ,m$, is the set of nonnegative integers which are $\le H$;
$A^s_\xi $ is the set of nonnegative integers a such that $a\le \bar{p}_s$ and $\xi -h_sp_s\ge 0$;
the transition mappings are $\pi _{s}(\xi ,a)=\xi -ah_s$;
the outputs are $\chi _s(\xi ,a)=f_s(a)$, $1\le s\le m$.

In the notation of Sect. 2.6.2, vectors $[p_1;\ldots ;p_m]\in \mathcal{P}$ are exactly the sequences of actions $a_1,\ldots ,a_m$ stemming from the trajectories of the just defined system $\mathcal{S}$.

Observe that matrix $D=D[\mathcal{S}]$ is simple, provided the cardinalities of $\Xi _s$ and $A^s_\xi $ are reasonable. Indeed, given $x=[x_1;\ldots ;x_m]\in {\mathbb {R}}^{n}={\mathbb {R}}^{r_1}\times \cdots \times {\mathbb {R}}^{r_m}$, we can identify $\overline{D}[x]$ by dynamic programming, running first the backward Bellman recurrence

$$\begin{aligned} \left. \begin{array}{rcl} U_{s}(\xi )&{}=&{}\max \limits _{a\in A^s_\xi }\left\{ x_s^T\chi _s(\xi ,a)+U_{s+1}(\pi _s(\xi ,a))\right\} \\ A_s(\xi )&{}=&{}\mathop {\mathrm{Argmax}\,}\limits _{a\in A^s_\xi }\left\{ x_s^T\chi _s(\xi ,a)+U_{s+1}(\pi _s(\xi ,a))\right\} \\ \end{array}\right\} , \xi \in \Xi _s,\, s=m,m-1,\ldots ,1 \end{aligned}$$

(where $U_{m+1}(\cdot )\equiv 0$), and then recovering the (trajectory indexing the) column of D corresponding to $\overline{D}[x]$ by running the forward Bellman recurrence

$$\begin{aligned} \begin{array}{rcl} \xi _1&{}\in &{}\mathop {\mathrm{Argmax}\,}_{\xi \in \Xi _1} U_1(\xi )\Rightarrow a_1\in A_1(\xi _1)\Rightarrow \cdots \\ \Rightarrow \xi _{s+1}&{}=&{}\pi _s(\xi _s,a_s)\Rightarrow a_{s+1}\in A_{s+1}(\xi _{s+1})\Rightarrow \cdots \\ \end{array}, s=1,2,\ldots ,m-1. \end{aligned}$$

1.2 Attacker Versus Defender Via Ellipsoid Algorithm

In our implementation,

1.
Relation (39) is ensured by specifying U, V as centered at the origin Euclidean balls of radius R, where R is an upper bound on the Euclidean norms of the columns in D and in A (such a bound can be easily obtained from the knapsack data specifying the matrices D, A).
2.
We process the monotone vector field associated with the primal SP problem (30), that is, the field
$$\begin{aligned} F(u,v)=[F_u(u,v)=\overline{A}[u]-v;F_v(u,v)=u-\underline{D}[v]] \end{aligned}$$
by ellipsoid algorithm with accuracy certificates from [20]. For $\tau =1,2,\ldots ,$ the algorithm generates search points $[u_\tau ;v_\tau ]\in {\mathbb {R}}^K\times {\mathbb {R}}^K$, with $[u_1;v_1]=0$, along with execution protocols $\mathcal{I}^\tau =\{[u_i;v_i],F(u_i,v_i):i\in I_\tau \}$, where $I_\tau =\{i\le \tau :[u_i;v_i]\in U\times V\}$, augmented by accuracy certificates $\lambda ^\tau =\{\lambda ^\tau _i\ge 0:i\in I_\tau \}$ such that $\sum _{i\in I_\tau }\lambda ^\tau _i=1$. From the results of [20], it follows that for every $\epsilon >0$ it holds
$$\begin{aligned} \tau \ge N(\epsilon ):= O(1)K^2\ln \left( 2{R+\epsilon \over \epsilon }\right) \Rightarrow {\mathrm{Res}}(\mathcal{I}^\tau ,\lambda ^\tau \big |U\times V)\le \epsilon . \end{aligned}$$
(45)
3.
When computing $F(u_i,v_i)$ (this computation takes place only at productive steps—those with $[u_i;v_i]\in U\times V$), we get, as a by-product, the columns $A^i=\overline{A}[u_i]$ and $D^i=\underline{D}[v_i]$ of matrices A, D, along with the indexes $a^i$, $d^i$ of these columns (recall that these indexes are pure strategies of attacker and defender and thus, according to the construction of A, D, are collections of m nonnegative integers). In our implementation, we stored these columns, same as their indexes and the corresponding search points $[u_i;v_i]$. As is immediately seen, in the case in question the approximate solution $[w^\tau ;z^\tau ]$ to the SP problem of interest (27) induced by execution protocol $\mathcal{I}^\tau $ and accuracy certificate $\lambda ^\tau $ is comprised of two sparse vectors
$$\begin{aligned} w^\tau =\sum _{i\in I_\tau }\lambda ^\tau _i\delta ^D_{d^i},\,\,z^\tau =\sum _{i\in I_\tau }\lambda ^\tau _i\delta ^A_{a^i}, \end{aligned}$$
(46)
where $\delta ^D_d$ is the “dth basic orth” in the simplex $\varDelta _N$ of probabilistic vectors with entries indexed by pure strategies of defender, and similarly for $\delta ^A_a$. Thus, we have no difficulties with representing our approximate solutions,^{Footnote 8} in spite of their huge ambient dimension.

According to our general theory and (45), the number of steps needed to get an $\epsilon $-solution [w; z] to the problem of interest (i.e., a feasible solution with $\epsilon _{{\tiny \mathrm sad}}([w;z]\big |\psi ,W,Z)\le \epsilon )$ does not exceed $N(\epsilon )$, with computational effort per step dominated by the necessity to identify $\overline{A}[u_i]$, $\underline{D}[v_i]$ by dynamic programming.

In fact, we used the outlined scheme with two straightforward modifications.

First, instead of building the accuracy certificates $\lambda ^\tau $ according to the rules from [20], we used the best, given execution protocols $\mathcal{I}^\tau $, accuracy certificates by solving the convex program
$$\begin{aligned} \min _\lambda \left\{ {\mathrm{Res}}(\mathcal{I}^\tau ,\lambda \big |U\times V)\!:=\!\max _{y\in U\times V}\sum _{i\in I_\tau } \lambda _i \langle F(u_i,v_i),[u_i;v_i]-y\rangle :\lambda _i\!\ge \!0,\sum _{i\in I_\tau }\lambda _i\!=\!1\right\} . \end{aligned}$$
(47)
In our implementation, this problem was solved once per $4K^2$ steps. Note that with U, V being Euclidean balls, (47) is a Conic Quadratic Problem and may be solved using, e.g., CVX [27].
Second, given current approximate solution (46) to the problem of interest, we can compute its saddle point inaccuracy exactly instead of upper-bounding it by ${\mathrm{Res}}(\mathcal{I}^\tau ,\lambda ^\tau \big |U\times V)$. Indeed, it is immediately seen that
$$\begin{aligned} \epsilon _{{\tiny \mathrm sad}}([w^\tau ;z^\tau ]\big |\psi ,W,Z)\!=\!{\hbox {Max}}\left( A^T\left[ \sum _{i\in I_\tau }\lambda ^\tau _iD^i\right] \right) -{\hbox {Min}}\left( D^T\left[ \sum _{i\in I_\tau }\lambda ^\tau _iA^i\right] \right) . \end{aligned}$$
In our implementation, we performed this computation each time when a new accuracy certificate was computed, and terminated the solution process when the saddle point inaccuracy became less than a given threshold (1.e-4).

Proof of Proposition 3.2

(i): Let $\xi _1,\xi _2\in \Xi $, and let $\eta _1=\overline{\eta }(\xi _1)$, $\eta _2=\overline{\eta }(\xi _2)$. By (32), we have

$$\begin{aligned} \langle \varPsi (\xi _2),\xi _2-\xi _1)\ge & {} \langle \varPhi (\xi _2,\eta _2),[\xi _2-\xi _1;\eta _2-\eta _1]\rangle ,\\ \langle \varPsi (\xi _1),\xi _1-\xi _2)\ge & {} \langle \varPhi (\xi _1,\eta _1),[\xi _1-\xi _2;\eta _1-\eta _2]\rangle .\\ \end{aligned}$$

Summing inequalities up, we get

$$\begin{aligned} \langle \varPsi (\xi _2)-\varPsi (\xi _1),\xi _2-\xi _1\rangle \ge \langle \varPhi (\xi _2,\eta _2)-\varPhi (\xi _1,\eta _1),[\xi _2-\xi _1;\eta _2-\eta _1]\rangle \ge 0, \end{aligned}$$

so that $\varPsi $ is monotone.

Furthermore, the first inequality in (35) is due to Proposition 3.1. To prove the second inequality in (35), let $\mathcal{I}_t\!=\!\{\xi _i\in \Xi ,\varPsi (\xi _i):1\le i\le t\}$, $\mathcal{J}_t\!=\!\{\theta _i:=[\xi _i;\overline{\eta }(\xi _i)],\varPhi (\theta _i):$ $1\le i\le t\}$, and let $\lambda $ be t-step accuracy certificate. We have

$$\begin{aligned} \begin{array}{l} \theta =[\xi ;\eta ]\in \Theta \Rightarrow \\ \sum _{i=1}^t\lambda _i\langle \varPhi (\theta _i),\theta _i-\theta \rangle \le \sum _{i=1}^t\lambda _i \langle \varPsi (\xi _i),\xi _i-\xi \rangle \hbox { [see (32)]}\\ \quad \le {\mathrm{Res}}(\mathcal{I}_t,\lambda \big |\Xi )\\ \quad \Rightarrow {\mathrm{Res}}(\mathcal{J}_t,\lambda \big |\Theta )=\sup _{\theta =[\xi ;\eta ]\in \Theta } \sum _{i=1}^t\lambda _i\langle \varPhi (\theta _i),\theta _i-\theta \rangle \le {\mathrm{Res}}(\mathcal{I}_t,\lambda \big |\Xi ). \end{array} \end{aligned}$$

(i) is proved.

(ii): Let $\eta \in H$. Invoking (34), we have

$$\begin{aligned} \langle \Gamma (\eta ),\widehat{\eta }-\eta \rangle \le \langle \varPhi (\overline{\xi }(\eta ),\eta ), [\widehat{\xi };\widehat{\eta }]-[\overline{\xi }(\eta );\eta ]\rangle \le \epsilon _{{\tiny \mathrm VI}}(\widehat{\theta }\big |\varPhi ,\Theta ), \end{aligned}$$

and (36) follows. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cox, B., Juditsky, A. & Nemirovski, A. Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators. J Optim Theory Appl 172, 402–435 (2017). https://doi.org/10.1007/s10957-016-0949-3

Download citation

Received: 13 October 2015
Accepted: 29 April 2016
Published: 13 June 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10957-016-0949-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

Abstract

Access this article

Similar content being viewed by others

Algorithms for Solving Variational Inequalities and Saddle Point Problems with Some Generalizations of Lipschitz Property for Operators

Continuous Projection Generalized Extra-Gradient Quasi-Newton Second-Order Method for Solving Saddle Point Problems

Proximal Methods Avoid Active Strict Saddles of Weakly Convex Functions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Lemma 2.1

Proof of Lemma 2.2

Proof of Lemma 2.3

1.1 Dynamic Programming-Generated Simple Matrices

1.2 Attacker Versus Defender Via Ellipsoid Algorithm

Proof of Proposition 3.2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

Abstract

Access this article

Similar content being viewed by others

Algorithms for Solving Variational Inequalities and Saddle Point Problems with Some Generalizations of Lipschitz Property for Operators

Continuous Projection Generalized Extra-Gradient Quasi-Newton Second-Order Method for Solving Saddle Point Problems

Proximal Methods Avoid Active Strict Saddles of Weakly Convex Functions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Lemma 2.1

Proof of Lemma 2.2

Proof of Lemma 2.3

1.1 Dynamic Programming-Generated Simple Matrices

1.2 Attacker Versus Defender Via Ellipsoid Algorithm

Proof of Proposition 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation