Skip to main content
Log in

Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

The majority of first-order methods for large-scale convex–concave saddle point problems and variational inequalities with monotone operators are proximal algorithms. To make such an algorithm practical, the problem’s domain should be proximal-friendly—admit a strongly convex function with easy to minimize linear perturbations. As a by-product, this domain admits a computationally cheap linear minimization oracle (LMO) capable to minimize linear forms. There are, however, important situations where a cheap LMO indeed is available, but the problem domain is not proximal-friendly, which motivates search for algorithms based solely on LMO. For smooth convex minimization, there exists a classical algorithm using LMO—conditional gradient. In contrast, known to us similar techniques for other problems with convex structure (nonsmooth convex minimization, convex–concave saddle point problems, even as simple as bilinear ones, and variational inequalities with monotone operators, even as simple as affine) are quite recent and utilize common approach based on Fenchel-type representations of the associated objectives/vector fields. The goal of this paper was to develop alternative (and seemingly much simpler) decomposition techniques based on LMO for bilinear saddle point problems and for variational inequalities with affine monotone operators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. In retrospect, a special case of this strategy was used in [2224].

  2. Note that the saddle point frontier depends on the order of blocks in the x- and the y-variables, and this order will always be clear from the context.

  3. The construction to follow can be easily extended from “knapsack-generated” matrices to more general “Dynamic Programming-generated” ones, see Sect. 1 in the “Appendix.”

  4. This story is a variation of what is called “Colonel Blotto Game” in Game Theory; see, e.g., [25, 26] and references therein.

  5. For implementation details, see Sect. 1.

  6. “a primal” instead of “the primal” reflects the fact that \(\varPsi \) is not uniquely defined by \(\varPhi \)—it is defined by \(\varPhi \) and \(\overline{\eta }\) and by how the values of \(\varPsi \) are selected when (32) does not specify these values uniquely.

  7. “covers” instead of “is equivalent” stems from the fact that the scope of decomposition is not restricted to the setups of the form of (41).

  8. Note that applying Carathéodory theorem, we could further “compress” the representations of approximate solutions—make these solutions convex combinations of at most \(K+1\) of \(\delta ^D_{d^i}\)s and \(\delta ^A_{a^i}\)s.

References

  1. Juditsky, A., Nemirovski, A.: Solving variational inequalities with monotone operators on domains given by linear minimization oracles. Math. Program. 152(1), 1–36 (2013)

    MathSciNet  MATH  Google Scholar 

  2. Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152(1), 75–112 (2014)

    MathSciNet  MATH  Google Scholar 

  3. Ziegler, G.M.: Lectures on Polytopes, vol. 152. Springer, Berlin (1995)

    MATH  Google Scholar 

  4. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3(1–2), 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  5. Demyanov, V., Rubinov, A.: Approximate Methods in optimization problems, vol. 32. Elsevier, Amsterdam (1970)

    Google Scholar 

  6. Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  7. Freund, R.M., Grigas, P.: New analysis and results for the Frank–Wolfe method. Math. Program. 155(1–2), 199–230 (2016)

  8. Garber, D., Hazan, E.: Faster Rates for the Frank–Wolfe Method Over Strongly-convex Sets. arXiv preprint arXiv:1406.1305 (2014)

  9. Harchaoui, Z., Douze, M., Paulin, M., Dudik, M., Malick, J.: Large-scale image classification with trace-norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3386–3393. IEEE (2012)

  10. Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 427–435 (2013)

  11. Jaggi, M., Sulovsk, M., et al.: A simple algorithm for nuclear norm regularized problems. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 471–478 (2010)

  12. Pshenichny, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. Mir Moscow (1978)

  13. Argyriou, A., Signoretto, M., Suykens, J.A.K.: Hybrid algorithms with applications to sparse and low rank regularization, chap. 3. In: Suykens, J.A.K., Signoretto,M., Argyriou, A., (eds.) Regularization, Optimization, Kernels, and Support Vector Machines, pp. 53–82. Chapman & Hall/CRC (2014)

  14. Pierucci, F., Harchaoui, Z., Malick, J.: A Smoothing Approach for Composite Conditional Gradient with Nonsmooth Loss. Tech. rep., Inria (2014). https://hal.inria.fr/hal-01096630/

  15. Tewari, A., Ravikumar, P.K., Dhillon, I.S.: Greedy algorithms for structurally constrained high dimensional problems. In: Advances in Neural Information Processing Systems, pp. 882–890 (2011)

  16. Ying, Y., Li, P.: Distance metric learning with eigenvalue optimization. J. Mach. Learn. Res. 13(1), 1–26 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Cox, B., Juditsky, A., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148(1–2), 143–180 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  18. Lan, G., Zhou, Y.: Conditional Gradient Sliding for Convex Optimization (2014). http://www.ise.ufl.edu/glan/files/2015/09/CGS08-31

  19. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, vol. 305. Springer, Berlin (2013)

  20. Nemirovski, A., Onn, S., Rothblum, U.G.: Accuracy certificates for computational problems with convex structure. Math. Oper. Res. 35(1), 52–78 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  21. Cox, B.: Applications of Accuracy Certificates for Problems with Convex Structure. Ph.D. thesis, Georgia Institute of Technology (2011). https://smartech.gatech.edu/jspui/bitstream/1853/39489/1/cox_bruce_a_201105_phd

  22. Gol’stein, E.: Direct-dual block method of linear programming. Autom. Remote Control 57(11), 1531–1536 (1996)

    Google Scholar 

  23. Gol’stein, E., Sokolov, N.: A decomposition algorithm for solving multicommodity production-and-transportation problem. Ekonomika i Matematicheskie Metody 33(1), 112–128 (1997)

    MATH  Google Scholar 

  24. Dvurechensky, P., Nesterov, Y., Spokoiny, V.: Primal-dual methods for solving infinite-dimensional games. J. Optim. Theory Appl. 166(1), 23–51 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. Bellman, R.: On “Colonel Blotto” and analogous games. SIAM Rev. 11(1), 66–68 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  26. Robertson, B.: The Colonel Blotto game. Econ. Theory 29(1), 1–24 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  27. Grant, M., Boyd, S.: Cvx: Matlab Software for Disciplined Convex Programming, version 2.1 (2015). http://cvxr.com/cvx

Download references

Acknowledgments

A. Juditsky was supported by the CNRS-Mastodons project Titan, and the LabEx PERSYVAL-Lab (ANR-11-LABX-0025). Research of A. Nemirovski was supported by the NSF Grants CMMI-1232623, CCF-1415498, CMMI-1262063.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anatoli Juditsky.

Appendix

Appendix

Proof of Lemma 2.1

It suffices to prove the \(\phi \)-related statements. Lipschitz continuity of \(\phi \) in the direct product case is evident. Furthermore, the function \(\theta (x_1,x_2;y_1)=\max \limits _{y_2\in Y_2[y_1]}\varPhi (x_1,x_2;y_1,y_2)\) is convex and Lipschitz continuous in \(x=[x_1;x_2]\in X\) for every \(y_1\in Y_1\), whence

$$\begin{aligned} \phi (x_1,y_1)=\min \limits _{x_2\in X_2[x_1]}\theta (x_1,x_2;y_1) \end{aligned}$$

is convex and lower semicontinuous in \(x_1\in X_1\) (note that X is compact). On the other hand,

$$\begin{aligned} \begin{array}{rcl} \phi (x_1,y_1)&{}=&{}\max \limits _{y_2\in Y_2[y_1]}\min \limits _{x_2\in X_2[x_1]}\varPhi (x_1,x_2;y_1,y_2)\\ &{}=&{}\max \limits _{y_2\in Y_2[y_1]} \left[ \chi (x_1;y_1,y_2):=\min \limits _{x_2\in X_2[x_1]}\varPhi (x_1,x_2;y_1,y_2)\right] , \end{array} \end{aligned}$$

so that \(\chi (x_1;y_1,y_2)\) is concave and Lipschitz continuous in \(y=[y_1;y_2]\in Y\) for every \(x_1\in X_1\), whence

$$\begin{aligned} \phi (x_1,y_1)=\max \limits _{y_2\in Y_2[y_1]}\chi (x_1;y_1,y_2) \end{aligned}$$

is concave and upper semicontinuous in \(y_1\in Y_1\) (note that Y is compact).

Next, we have

$$\begin{aligned} \begin{array}{l} {\mathrm{SadVal}}(\phi ,X_1,X_2)=\inf \limits _{x_1\in X_1}\left[ \sup \limits _{y_1\in Y_1}\left[ \sup \limits _{y_2:[y_1;y_2]\in Y}\inf \limits _{x_2: [x_1;x_2]\in X}\varPhi (x_1,x_2;y_1,y_2)\right] \right] \\ \quad = \inf \limits _{x_1\in X_1}\left[ \sup \limits _{[y_1;y_2]\in Y}\inf \limits _{x_2:[x_1;x_2]\in X}\varPhi (x_1,x_2;y_1,y_2)\right] \\ \quad =\inf \limits _{x_1\in X_1}\left[ \inf \limits _{x_2:[x_1;x_2]\in X}\sup \limits _{[y_1;y_2]\in Y}\varPhi (x_1,x_2;y_1,y_2)\right] \hbox { [by Sion-Kakutani Theorem [19]]}\\ \quad =\inf \limits _{[x_1;x_2]\in X}\sup \limits _{[y_1;y_2]\in Y}\varPhi (x_1,x_2;y_1,y_2)={\mathrm{SadVal}}(\varPhi ,X,Y),\\ \end{array} \end{aligned}$$

as required in (2). Finally, let \(\bar{x}=[\bar{x}_1;\bar{x}_2]\in X\) and \(\bar{y}=[\bar{y}_1;\bar{y}_2]\in Y\). We have

$$\begin{aligned} \begin{array}{rcl} \overline{\phi }(\bar{x}_1)-{\mathrm{SadVal}}(\phi ,X_1,Y_1)&{}=&{}\overline{\phi }(\bar{x}_1)-{\mathrm{SadVal}}(\varPhi ,X,Y)\hbox { [by (2)]}\\ &{}=&{}\sup \limits _{y_1\in Y_1}\phi (\bar{x}_1,y_1)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\sup \limits _{y_1\in Y_1}\sup \limits _{y_2:[y_1;y_2]\in Y}\inf \limits _{x_2:[\bar{x}_1;x_2]\in X}\varPhi (\bar{x}_1,x_2;y_1,y_2)\\ &{}&{}-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\sup \limits _{[y_1;y_2]\in Y}\inf \limits _{x_2:[\bar{x}_1;x_2]\in X}\varPhi (\bar{x}_1,x_2;y_1,y_2)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\inf \limits _{x_2:[\bar{x}_1;x_2]\in X}\sup \limits _{y=[y_1;y_2]\in Y}\varPhi (\bar{x}_1,x_2;y)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}\le &{}\sup \limits _{y=[y_1;y_2]\in Y}\varPhi (\bar{x}_1,\bar{x}_2;y)-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}=&{}\overline{\varPhi }(\bar{x})-{\mathrm{SadVal}}(\varPhi ,X,Y)\\ \end{array} \end{aligned}$$

and

$$\begin{aligned} \begin{array}{rcl} {\mathrm{SadVal}}(\phi ,X_1,Y_1)-\underline{\phi }(\bar{y}_1)&{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)-\underline{\phi }(\bar{y}_1)\hbox { [by (2)]}\\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)-\inf \limits _{x_1\in X_1}\phi (x_1,\bar{y}_1)\\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)\\ &{}&{}-\inf \limits _{x_1\in X_1}\left[ \inf \limits _{x_2:[x_1;x_2]\in X}\sup \limits _{y_2:[\bar{y}_1;y_2]\in Y}\varPhi (x_1,x_2;\bar{y}_1,y_2)\right] \\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)\!-\!\inf \limits _{x=[x_1;x_2]\in X}\sup \limits _{y_2:[\bar{y}_1;y_2]\in Y}\varPhi (x;\bar{y}_1,y_2)\\ &{}\le &{} {\mathrm{SadVal}}(\varPhi ,X,Y)-\inf \limits _{x=[x_1;x_2]\in X}\varPhi (x;\bar{y}_1,\bar{y}_2)\\ &{}=&{}{\mathrm{SadVal}}(\varPhi ,X,Y)-\underline{\varPhi }(\bar{y}).\\ \end{array} \end{aligned}$$

We conclude that

$$\begin{aligned} \begin{array}{l} \epsilon _{{\tiny \mathrm sad}}([\bar{x}_1;\bar{y}_1]\big |\phi ,X_1,Y_1)=\left[ \overline{\phi }(\bar{x}_1)-{\mathrm{SadVal}}(\phi ,X_1,Y_1)\right] \\ \quad +\left[ {\mathrm{SadVal}}(\phi ,X_1,Y_1)-\underline{\phi }(\bar{y}_1)\right] \\ \le \left[ \overline{\varPhi }(\bar{x})-{\mathrm{SadVal}}(\varPhi ,X,Y)\right] +\left[ {\mathrm{SadVal}}(\varPhi ,X,Y)-\underline{\varPhi }(\bar{y})\right] \!=\! \epsilon _{{\tiny \mathrm sad}}([\bar{x};\bar{y}]\big |\varPhi ,X,Y),\\ \end{array} \end{aligned}$$

as claimed in (3). \(\square \)

Proof of Lemma 2.2

For \(x_1\in X_1\), we have

$$\begin{aligned} \begin{array}{l} \phi (x_1;\bar{y}_1) =\min \limits _{x_2:[x_1;x_2]\in X}\max \limits _{y_2:[\bar{y}_1;y_2]\in Y}\varPhi (x_1,x_2;\bar{y}_1,y_2)\ge \min \limits _{x_2:[x_1;x_2]\in X}\varPhi (x_1,x_2;\bar{y}_1,\bar{y}_2)\\ \quad \ge \min \limits _{x_2:[x_1;x_2]\in X}\big [\underbrace{\varPhi (\bar{x};\bar{y})}_{\phi (\bar{x}_1;\bar{y}_1))}+\langle G,[x_1;x_2]-[\bar{x}_1;\bar{x}_2]\big ]\rangle \\ ~~~\, [\hbox {since }\varPhi (x;\bar{y})\hbox { is convex and } G\in \partial _x\varPhi (\bar{x};\bar{y})]\\ \ge \phi (\bar{x}_1;\bar{y}_1)+\langle g,x_1-\bar{x}_1\rangle \, [\hbox {by definition of }g,G],\\ \end{array} \end{aligned}$$

as claimed in (a). “Symmetric” reasoning justifies (b). \(\square \)

Proof of Lemma 2.3

Assume that (5) holds true. Then, G clearly is certifying, implying that

$$\begin{aligned} \chi _G(\bar{x}_1)=\langle G,[\bar{x}_1;\bar{x}_2]\rangle , \end{aligned}$$

and therefore (5) reads

$$\begin{aligned} \langle G,[x_1;x_2]\rangle \ge \chi _G(\bar{x}_1)+\langle g,x_1-\bar{x}_1\rangle \quad \forall x=[x_1;x_2]\in X, \end{aligned}$$

where taking minimum in the left-hand side over \(x_2\in X_2[x_1]\),

$$\begin{aligned} \chi _G(x_1)\ge \chi _G(\bar{x}_1)+\langle g,x_1-\bar{x}_1\rangle \quad \forall x_1\in X_1, \end{aligned}$$

as claimed in (ii).

Now assume that (i) and (ii) hold true. By (i), \(\chi _G(\bar{x}_1)=\langle G,[\bar{x}_1;\bar{x}_2]\rangle \), and by (ii) combined with the definition of \(\chi _G\),

$$\begin{aligned} \forall x= & {} [x_1;x_2]\in X: \langle G,[x_1;x_2]\rangle \ge \chi _G(x_1)\ge \chi _G(\bar{x}_1)+\langle g,x_1-\bar{x}_1\rangle \\= & {} \langle G,\bar{x}\rangle + \langle g,x_1-\bar{x}_1\rangle , \end{aligned}$$

implying (5). \(\square \)

1.1 Dynamic Programming-Generated Simple Matrices

Consider the situation as follows. There exists an evolving in time system \(\mathcal{S}\), with state \(\xi _s\) at time \(s=1,2,\ldots ,m\) belonging to a given finite nonempty set \(\Xi _s\). Furthermore, every pair \((\xi ,s)\) with \(s\in \{1,\ldots ,m\}\), \(\xi \in \Xi _s\) is associated with nonempty finite set of actions \(A^s_\xi \), and we set

$$\begin{aligned} \mathcal{S}_s=\{(\xi ,a):\xi \in \Xi _s,a\in A^s_\xi \}. \end{aligned}$$

Furthermore, for every s, \(1\le s< m\), a transition mapping \(\pi _{s}(\xi ,a):\mathcal{S}_s\rightarrow \Xi _{s+1}\) is given. Finally, we are given vector-valued functions (”outputs”) \(\chi _s:\mathcal{S}_s\rightarrow {\mathbb {R}}^{r_s}\).

A trajectory of \(\mathcal{S}\) is a sequence \(\{(\xi _s,a_s):1\le s\le m\}\) such that \((\xi _s,a_s)\in \mathcal{S}_s\) for \(1\le s\le m\) and

$$\begin{aligned} \xi _{s+1}=\pi _{s}(\xi _s,a_s),\,1\le s<m. \end{aligned}$$

The output of a trajectory \(\tau =\{(\xi _s,a_s):1\le s\le m\}\) is the block vector

$$\begin{aligned} \chi [\tau ]=[\chi _1(\xi _1,a_1);\ldots ;\chi _m(\xi _m,a_m)]. \end{aligned}$$

We can associate with \(\mathcal{S}\) the matrix \(D=D[\mathcal{S}]\) with \(K=r_1+\cdots +r_m\) rows and with columns indexed by the trajectories of \(\mathcal{S}\); specifically, the column indexed by a trajectory \(\tau \) is \(\chi [\tau ]\).

For example, knapsack-generated matrix D associated with knapsack data from Sect. 2.6.2 is of the form \(D[\mathcal{S}]\) with system \(\mathcal{S}\) as follows:

  • \(\Xi _s\), \(s=1,\ldots ,m\), is the set of nonnegative integers which are \(\le H\);

  • \(A^s_\xi \) is the set of nonnegative integers a such that \(a\le \bar{p}_s\) and \(\xi -h_sp_s\ge 0\);

  • the transition mappings are \(\pi _{s}(\xi ,a)=\xi -ah_s\);

  • the outputs are \(\chi _s(\xi ,a)=f_s(a)\), \(1\le s\le m\).

In the notation of Sect. 2.6.2, vectors \([p_1;\ldots ;p_m]\in \mathcal{P}\) are exactly the sequences of actions \(a_1,\ldots ,a_m\) stemming from the trajectories of the just defined system \(\mathcal{S}\).

Observe that matrix \(D=D[\mathcal{S}]\) is simple, provided the cardinalities of \(\Xi _s\) and \(A^s_\xi \) are reasonable. Indeed, given \(x=[x_1;\ldots ;x_m]\in {\mathbb {R}}^{n}={\mathbb {R}}^{r_1}\times \cdots \times {\mathbb {R}}^{r_m}\), we can identify \(\overline{D}[x]\) by dynamic programming, running first the backward Bellman recurrence

$$\begin{aligned} \left. \begin{array}{rcl} U_{s}(\xi )&{}=&{}\max \limits _{a\in A^s_\xi }\left\{ x_s^T\chi _s(\xi ,a)+U_{s+1}(\pi _s(\xi ,a))\right\} \\ A_s(\xi )&{}=&{}\mathop {\mathrm{Argmax}\,}\limits _{a\in A^s_\xi }\left\{ x_s^T\chi _s(\xi ,a)+U_{s+1}(\pi _s(\xi ,a))\right\} \\ \end{array}\right\} , \xi \in \Xi _s,\, s=m,m-1,\ldots ,1 \end{aligned}$$

(where \(U_{m+1}(\cdot )\equiv 0\)), and then recovering the (trajectory indexing the) column of D corresponding to \(\overline{D}[x]\) by running the forward Bellman recurrence

$$\begin{aligned} \begin{array}{rcl} \xi _1&{}\in &{}\mathop {\mathrm{Argmax}\,}_{\xi \in \Xi _1} U_1(\xi )\Rightarrow a_1\in A_1(\xi _1)\Rightarrow \cdots \\ \Rightarrow \xi _{s+1}&{}=&{}\pi _s(\xi _s,a_s)\Rightarrow a_{s+1}\in A_{s+1}(\xi _{s+1})\Rightarrow \cdots \\ \end{array}, s=1,2,\ldots ,m-1. \end{aligned}$$

1.2 Attacker Versus Defender Via Ellipsoid Algorithm

In our implementation,

  1. 1.

    Relation (39) is ensured by specifying U, V as centered at the origin Euclidean balls of radius R, where R is an upper bound on the Euclidean norms of the columns in D and in A (such a bound can be easily obtained from the knapsack data specifying the matrices D, A).

  2. 2.

    We process the monotone vector field associated with the primal SP problem (30), that is, the field

    $$\begin{aligned} F(u,v)=[F_u(u,v)=\overline{A}[u]-v;F_v(u,v)=u-\underline{D}[v]] \end{aligned}$$

    by ellipsoid algorithm with accuracy certificates from [20]. For \(\tau =1,2,\ldots ,\) the algorithm generates search points \([u_\tau ;v_\tau ]\in {\mathbb {R}}^K\times {\mathbb {R}}^K\), with \([u_1;v_1]=0\), along with execution protocols \(\mathcal{I}^\tau =\{[u_i;v_i],F(u_i,v_i):i\in I_\tau \}\), where \(I_\tau =\{i\le \tau :[u_i;v_i]\in U\times V\}\), augmented by accuracy certificates \(\lambda ^\tau =\{\lambda ^\tau _i\ge 0:i\in I_\tau \}\) such that \(\sum _{i\in I_\tau }\lambda ^\tau _i=1\). From the results of [20], it follows that for every \(\epsilon >0\) it holds

    $$\begin{aligned} \tau \ge N(\epsilon ):= O(1)K^2\ln \left( 2{R+\epsilon \over \epsilon }\right) \Rightarrow {\mathrm{Res}}(\mathcal{I}^\tau ,\lambda ^\tau \big |U\times V)\le \epsilon . \end{aligned}$$
    (45)
  3. 3.

    When computing \(F(u_i,v_i)\) (this computation takes place only at productive steps—those with \([u_i;v_i]\in U\times V\)), we get, as a by-product, the columns \(A^i=\overline{A}[u_i]\) and \(D^i=\underline{D}[v_i]\) of matrices A, D, along with the indexes \(a^i\), \(d^i\) of these columns (recall that these indexes are pure strategies of attacker and defender and thus, according to the construction of A, D, are collections of m nonnegative integers). In our implementation, we stored these columns, same as their indexes and the corresponding search points \([u_i;v_i]\). As is immediately seen, in the case in question the approximate solution \([w^\tau ;z^\tau ]\) to the SP problem of interest (27) induced by execution protocol \(\mathcal{I}^\tau \) and accuracy certificate \(\lambda ^\tau \) is comprised of two sparse vectors

    $$\begin{aligned} w^\tau =\sum _{i\in I_\tau }\lambda ^\tau _i\delta ^D_{d^i},\,\,z^\tau =\sum _{i\in I_\tau }\lambda ^\tau _i\delta ^A_{a^i}, \end{aligned}$$
    (46)

    where \(\delta ^D_d\) is the “dth basic orth” in the simplex \(\varDelta _N\) of probabilistic vectors with entries indexed by pure strategies of defender, and similarly for \(\delta ^A_a\). Thus, we have no difficulties with representing our approximate solutions,Footnote 8 in spite of their huge ambient dimension.

According to our general theory and (45), the number of steps needed to get an \(\epsilon \)-solution [wz] to the problem of interest (i.e., a feasible solution with \(\epsilon _{{\tiny \mathrm sad}}([w;z]\big |\psi ,W,Z)\le \epsilon )\) does not exceed \(N(\epsilon )\), with computational effort per step dominated by the necessity to identify \(\overline{A}[u_i]\), \(\underline{D}[v_i]\) by dynamic programming.

In fact, we used the outlined scheme with two straightforward modifications.

  • First, instead of building the accuracy certificates \(\lambda ^\tau \) according to the rules from [20], we used the best, given execution protocols \(\mathcal{I}^\tau \), accuracy certificates by solving the convex program

    $$\begin{aligned} \min _\lambda \left\{ {\mathrm{Res}}(\mathcal{I}^\tau ,\lambda \big |U\times V)\!:=\!\max _{y\in U\times V}\sum _{i\in I_\tau } \lambda _i \langle F(u_i,v_i),[u_i;v_i]-y\rangle :\lambda _i\!\ge \!0,\sum _{i\in I_\tau }\lambda _i\!=\!1\right\} . \end{aligned}$$
    (47)

    In our implementation, this problem was solved once per \(4K^2\) steps. Note that with U, V being Euclidean balls, (47) is a Conic Quadratic Problem and may be solved using, e.g., CVX [27].

  • Second, given current approximate solution (46) to the problem of interest, we can compute its saddle point inaccuracy exactly instead of upper-bounding it by \({\mathrm{Res}}(\mathcal{I}^\tau ,\lambda ^\tau \big |U\times V)\). Indeed, it is immediately seen that

    $$\begin{aligned} \epsilon _{{\tiny \mathrm sad}}([w^\tau ;z^\tau ]\big |\psi ,W,Z)\!=\!{\hbox {Max}}\left( A^T\left[ \sum _{i\in I_\tau }\lambda ^\tau _iD^i\right] \right) -{\hbox {Min}}\left( D^T\left[ \sum _{i\in I_\tau }\lambda ^\tau _iA^i\right] \right) . \end{aligned}$$

    In our implementation, we performed this computation each time when a new accuracy certificate was computed, and terminated the solution process when the saddle point inaccuracy became less than a given threshold (1.e-4).

Proof of Proposition 3.2

(i): Let \(\xi _1,\xi _2\in \Xi \), and let \(\eta _1=\overline{\eta }(\xi _1)\), \(\eta _2=\overline{\eta }(\xi _2)\). By (32), we have

$$\begin{aligned} \langle \varPsi (\xi _2),\xi _2-\xi _1)\ge & {} \langle \varPhi (\xi _2,\eta _2),[\xi _2-\xi _1;\eta _2-\eta _1]\rangle ,\\ \langle \varPsi (\xi _1),\xi _1-\xi _2)\ge & {} \langle \varPhi (\xi _1,\eta _1),[\xi _1-\xi _2;\eta _1-\eta _2]\rangle .\\ \end{aligned}$$

Summing inequalities up, we get

$$\begin{aligned} \langle \varPsi (\xi _2)-\varPsi (\xi _1),\xi _2-\xi _1\rangle \ge \langle \varPhi (\xi _2,\eta _2)-\varPhi (\xi _1,\eta _1),[\xi _2-\xi _1;\eta _2-\eta _1]\rangle \ge 0, \end{aligned}$$

so that \(\varPsi \) is monotone.

Furthermore, the first inequality in (35) is due to Proposition 3.1. To prove the second inequality in (35), let \(\mathcal{I}_t\!=\!\{\xi _i\in \Xi ,\varPsi (\xi _i):1\le i\le t\}\), \(\mathcal{J}_t\!=\!\{\theta _i:=[\xi _i;\overline{\eta }(\xi _i)],\varPhi (\theta _i):\) \(1\le i\le t\}\), and let \(\lambda \) be t-step accuracy certificate. We have

$$\begin{aligned} \begin{array}{l} \theta =[\xi ;\eta ]\in \Theta \Rightarrow \\ \sum _{i=1}^t\lambda _i\langle \varPhi (\theta _i),\theta _i-\theta \rangle \le \sum _{i=1}^t\lambda _i \langle \varPsi (\xi _i),\xi _i-\xi \rangle \hbox { [see (32)]}\\ \quad \le {\mathrm{Res}}(\mathcal{I}_t,\lambda \big |\Xi )\\ \quad \Rightarrow {\mathrm{Res}}(\mathcal{J}_t,\lambda \big |\Theta )=\sup _{\theta =[\xi ;\eta ]\in \Theta } \sum _{i=1}^t\lambda _i\langle \varPhi (\theta _i),\theta _i-\theta \rangle \le {\mathrm{Res}}(\mathcal{I}_t,\lambda \big |\Xi ). \end{array} \end{aligned}$$

(i) is proved.

(ii): Let \(\eta \in H\). Invoking (34), we have

$$\begin{aligned} \langle \Gamma (\eta ),\widehat{\eta }-\eta \rangle \le \langle \varPhi (\overline{\xi }(\eta ),\eta ), [\widehat{\xi };\widehat{\eta }]-[\overline{\xi }(\eta );\eta ]\rangle \le \epsilon _{{\tiny \mathrm VI}}(\widehat{\theta }\big |\varPhi ,\Theta ), \end{aligned}$$

and (36) follows. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cox, B., Juditsky, A. & Nemirovski, A. Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators. J Optim Theory Appl 172, 402–435 (2017). https://doi.org/10.1007/s10957-016-0949-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-016-0949-3

Keywords

Mathematics Subject Classification

Navigation