Skip to main content
Log in

Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Decomposition methods have been well studied for solving two-stage and multi-stage stochastic programming problems, see Rockafellar and Wets (Math. Oper. Res. 16:119–147, 1991), Ruszczyński and Shapiro (Stochastic Programming, Handbook in OR & MS, North-Holland Publishing Company, Amsterdam, 2003) and Ruszczyński (Math. Program. 79:333–353, 1997). In this paper, we propose an algorithmic framework based on the fundamental ideas of the methods for solving two-stage minimax distributionally robust optimization (DRO) problems where the underlying random variables take a finite number of distinct values. This is achieved by introducing nonanticipativity constraints for the first stage decision variables, rearranging the minimax problem through Lagrange decomposition and applying the well-known primal-dual hybrid gradient (PDHG) method to the new minimax problem. The algorithmic framework does not depend on specific structure of the ambiguity set. To extend the algorithm to the case that the underlying random variables are continuously distributed, we propose a discretization scheme and quantify the error arising from the discretization in terms of the optimal value and the optimal solutions when the ambiguity set is constructed through generalized prior moment conditions, the Kantorovich ball and \(\phi\)-divergence centred at an empirical probability distribution. Some preliminary numerical tests show the proposed decomposition algorithm featured with parallel computing performs well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In some literature, total variation metric is defined as \(\mathsf {dl}_{TV}=\sup _{B\in \mathscr {B}}|P(B) - Q(B)|\), see [9].

References

  1. Athreya, K.B., Lahiri, S.N.: Measure Theory and Probability Theory. Springer, New York (2006)

    MATH  Google Scholar 

  2. Bertsimas, D., Doan, X.V., Natarajan, K., Teo, C.P.: Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res. 35, 580–602 (2010)

    Article  MathSciNet  Google Scholar 

  3. Bertsimas, D., Parys, B. V.: Bootstrap robust prescriptive analytics. arXiv:1711.09974 (2017)

  4. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging. Vis. 40, 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  5. Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58, 592–612 (2010)

    MathSciNet  MATH  Google Scholar 

  6. Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal-dual algorithms for Convex Optimization in Imaging Science. SIAM J. Imaging Sci. 3, 1015–1046 (2010)

    Article  MathSciNet  Google Scholar 

  7. Fan, K.: Minimax theorems. Izv. Nats. Akad. Nauk Armen. Mekh 39, 42–47 (1953)

    MathSciNet  MATH  Google Scholar 

  8. Gao, R., Kleywegt, A.: Distributionally robust stochastic optimization with Wasserstein distance. arXiv:1604.02199 (2016)

  9. Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70, 419–435 (2002)

    Article  Google Scholar 

  10. Goh, J., Sim, M.: Distributionally robust optimization and its tractable approximations. Oper. Res. 58, 902–917 (2010)

    Article  MathSciNet  Google Scholar 

  11. Goldstein, T., Li, M., Yuan, X., Esser, E., Baraniuk, R.: Adaptive primal-dual hybrid gradient methods for saddle-point problems. arXiv:1305.0546 (2013)

  12. Guo, S., Xu, H., Zhang, L.: Convergence analysis for mathematical programs with distributionally robust chance constraint. SIAM J. Optim. 27, 784–816 (2017)

    Article  MathSciNet  Google Scholar 

  13. Guo, S., Xu, H.: Distributionally robust shortfall risk optimization model and its approximation. Math. Program. 174, 473–498 (2019)

    Article  MathSciNet  Google Scholar 

  14. Hanasusanto, G.A., Kuhn, D.: Conic programming reformulations of two-stage distributionally robust linear programs over Wasserstein balls. Oper. Res. 66, 849–869 (2018)

    Article  MathSciNet  Google Scholar 

  15. He, B., Ma, F., Yuan, X.: An algorithm framework of generalized primal-dual hybrid gradient methods for saddle point problems. J. Math. Imaging. Vis. 58, 279–293 (2017)

    Article  Google Scholar 

  16. He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5, 119–149 (2012)

    Article  MathSciNet  Google Scholar 

  17. Jiang, R., Guan, Y.: Risk-averse two-stage stochastic program with distributional ambiguity. Oper. Res. 66, 1390–1405 (2018)

    Article  MathSciNet  Google Scholar 

  18. Liu, Y., Pichler, A., Xu, H.: Discrete approximation and quantification in distributionally robust optimization. Math. Oper. Res. 44, 19–37 (2019)

    Article  MathSciNet  Google Scholar 

  19. Liu, Y., Yuan, X., Zeng, S., Zhang, J.: Primal-dual hybrid gradient method for distributionally robust optimization problems. Oper. Res. Lett. 45, 625–630 (2017)

    Article  MathSciNet  Google Scholar 

  20. Liu, Y., Yuan, X., Zhang, J.: Quantitative stability analysis of stochastic programs with distributionally robust second order dominance constraints, manuscript (2017)

  21. Love, D., Bayrakcan, G.: Phi-divergence constrained ambiguous stochastic programs for data-driven optimization, available on researchgate.net (2016)

  22. Mohajerin Esfahani, P., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171, 115–166 (2018)

    Article  MathSciNet  Google Scholar 

  23. Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC, Boca Raton (2005)

    MATH  Google Scholar 

  24. Pflug, G.C., Pichler, A.: Approximations for probability distributions and stochastic optimization problems. In: Bertocchi, M., Consigli, G., Dempster, M.A.H. (eds.) Stochastic Optimization Methods in Finance and Energy, vol. 163 of International Series in Operations Research & Management Science. Springer, New York (2011)

    Google Scholar 

  25. Pflug, G.C., Pichler, A.: Multistage Stochastic Optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York (2014)

    Google Scholar 

  26. Pflug, G.C., Wozabal, D.: Ambiguity in portfolio selection. Quant. Financ. 7, 435–442 (2007)

    Article  MathSciNet  Google Scholar 

  27. Pichler, A., Xu, H.: Quantitative stability analysis for minimax distributionally robust risk optimization, To appear in Math. Program. (2018)

  28. Rachev, S.T.: Probability Metrics and the Stability of Stochastic Models. Wiley, West Sussex (1991)

    MATH  Google Scholar 

  29. Rockafellar, R., Wets, R.J.-B.: Scenarios and policy aggregation in optimization under uncertainty. Math. Oper. Res. 16, 119–147 (1991)

    Article  MathSciNet  Google Scholar 

  30. Rockafellar, R., Sun, J.: Solving monotone stochastic variational inequalities and complementarity problems by progressive hedging. Math. Program. 174, 453–471 (2019)

    Article  MathSciNet  Google Scholar 

  31. Römisch, W.: Stability of stochastic programming problems. In: Rusczyński, A., Shapiro, A. (eds.) Stochastic Programming. Handbook in OR & MS, vol. 10. North-Holland Publishing Company, Amsterdam (2003)

    Google Scholar 

  32. Ruszczyński, A.: Decomposition method. In: Rusczyński, A., Shapiro, A. (eds.) Stochastic Programming, Handbook in OR & MS, vol. 10. North-Holland Publishing Company, Amsterdam (2003)

    Google Scholar 

  33. Ruszczyński, A.: Decomposition methods in stochastic programming. Math. Program. 79, 333–353 (1997)

    MathSciNet  MATH  Google Scholar 

  34. Rahimian, H., Bayraksan, G., Homem-de-Mello, T.: Identifying effective scenarios in distributionally robust stochastic programs with total variation distance. Math. Program. 173, 393–430 (2019)

    Article  MathSciNet  Google Scholar 

  35. Rusczyński, A., Shapiro, A.: Stochastic Programming, Handbook in OR & MS, vol. 10. North-Holland Publishing Company, Amsterdam (2003)

    Google Scholar 

  36. Scarf, H.: A min–max solution of an inventory problem. In: Arrow, K.S., Karlin, S., Scarf, H.E. (eds.) Studies in the Mathematical Theory of Inventory and Production, pp. 201–209. Stanford University Press, Palo Alto (1958)

    Google Scholar 

  37. Shapiro, A., Ahmed, S.: On a class of minimax stochastic programs. SIAM J. Optim. 14, 1237–1249 (2004)

    Article  MathSciNet  Google Scholar 

  38. Shapiro, A.: On duality theory of conic linear problems. In: Goberna, M.A., López, M.A. (eds.) Semi-Infinite Programming. Nonconvex Optimization and Its Applications, vol. 57. Springer, Boston (2001)

    Google Scholar 

  39. Sun, J., Liao, L.Z., Rodrigues, B.: Quadratic two-stage stochastic optimization with coherent measures of risk. Math. Program. 168, 599–613 (2018)

    Article  MathSciNet  Google Scholar 

  40. Sun, H., Xu, H.: Convergence analysis for distributionally robust optimization and equilibrium problems. Math. Oper. Res. 41, 377–401 (2016)

    Article  MathSciNet  Google Scholar 

  41. Weiss, P., Blanc-Feraud, L., Aubert, G.: Efficient schemes for total variation minimization under constraints in image processing. SIAM J. Sci. Comput. 31, 2047–2080 (2009)

    Article  MathSciNet  Google Scholar 

  42. Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62, 1358–376 (2014)

    Article  MathSciNet  Google Scholar 

  43. Xu, H., Liu, Y., Sun, H.: Distributionally robust optimization with matrix moment constraints: Lagrange duality and cutting plane methods. Math. Program. 169, 489–529 (2018)

    Article  MathSciNet  Google Scholar 

  44. Zhang, Y., Jiang, R., Shen, S.: Ambiguous chance-constrained binary programs under mean-covariance information. SIAM J. Optim. 28, 2922–2944 (2018)

    Article  MathSciNet  Google Scholar 

  45. Zhang, X., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46, 20–46 (2010)

    Article  MathSciNet  Google Scholar 

  46. Zhao, C., Guan, Y.: Data-driven risk-averse two-stage stochastic program with ζ-structure probability metrics, available at Optimization Online (2015). http://www.optimization-online.org/DB_FILE/2015/07/5014.pdf

  47. Zolotarev, V.M.: Probability metrics. Teoriya Veroyatnostei IEE Primeneniya 28, 264–287 (1983)

    MathSciNet  MATH  Google Scholar 

  48. Zhang, Z., Ahmend, S., Lan, G.: Efficient algorithms for distributionally robust stochastic optimization with discrete scenario support. arXiv:1909.11216 (2019)

  49. Zhu, M., Chan, T. F.: An efficient primal dual hybrid gradient algorithm for total variation image restoration. CAM Report 08-34, UCLA, Los Angeles, CA (2008)

Download references

Acknowledgements

We would like to thank Shabbir Ahmed for initiating the research in a private discussion with the 3rd author during the 14th international conference on stochastic programming in Búzios and his further encouragement during the preparation of the paper. We would also like to thank the two anonymous referees for insightful comments which help us significantly strengthen the presentation of the paper.

Funding

The funding was provided by National Natural Science Foundation of China (Grant No. 11871276 and 11771405).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hailin Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 1

The inequality holds trivially if \(Q\in {\tilde{{{\mathcal {P}}}}}\). So we only consider the case when \(Q\not \in {\tilde{{{\mathcal {P}}}}}\). We proceed the proof in three steps.

Step 1. By the definition of the total variation norm (see [1]), \(||P||= \sup _{\Vert \phi \Vert ^*\le 1} \langle P, \phi \rangle ,\) where \(\Vert \cdot \Vert ^*\) is the dual of norm \(\Vert \cdot \Vert\). Moreover, by the definition of the total variation metric

$$\begin{aligned} \mathsf {dl}_{TV}(Q,\tilde{{{\mathcal {P}}}})= & {} \inf _{P\in {\tilde{{{\mathcal {P}}}}}} \mathsf {dl}_{TV}(Q,P) \\= & {} \inf _{P\in \text{ cl } \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \sup _{\Vert \phi \Vert ^*\le 1} \langle Q-P, \phi \rangle \\= & {} \sup _{\Vert \phi \Vert ^*\le 1} \min _{P\in \text{ cl } \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle \\= & {} \sup _{\Vert \phi \Vert ^*\le 1} \inf _{P\in \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle , \end{aligned}$$

where \(\text{ cl }(\cdot )\) denotes closure of a set under topology of weak convergence, \(\langle P, \phi \rangle :=\int _{\tilde{\varXi }}\phi (\xi )P(d\xi )\) and the exchange is justified by [7, Theorem 2] under our assumption that \(\tilde{{{\mathcal {P}}}}\) is weakly compact. Note that we write \(\langle P, \phi \rangle\) for \({\mathbb {E}}_P[\phi ]\) in that later on we will relax P from a probability measure to a positive measure, and we will be able to see P clearly as a variable in the moment system and the moment system is linear in P. It is easy to observe that \(\mathsf {dl}_{TV}(P,{{\mathcal {P}}})\le 2\). Moreover, under the Slater type condition (16), it follows by [38, Proposition 3.4] that

$$\begin{aligned}&\inf _{P\in \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle \nonumber \\&\quad =\sup _{\lambda \in \varLambda ,\lambda _0} \inf _{P\in \mathscr {M}_+(\tilde{\varXi })} \langle Q-P, \phi \rangle + \lambda \bullet (\langle P,\varPsi \rangle -\mu ) + \lambda _0(\langle P,1\rangle -1)\nonumber \\&\quad =\sup _{\lambda \in \varLambda ,\lambda _0} \inf _{P\in \mathscr {M}_+(\tilde{\varXi })} \langle Q-P, \phi -\lambda \bullet \varPsi -\lambda _0\rangle \nonumber \\&\qquad + \langle Q, \lambda \bullet \varPsi +\lambda _0\rangle -\lambda \bullet \mu -\lambda _0, \end{aligned}$$
(49)

where \(\mathscr {M}_+(\tilde{\varXi })\) denotes the set of all positive measures defined on \(\tilde{\varXi }\), \(\varLambda :=\{(\lambda _1,\cdots ,\lambda _q): \lambda _i\succeq 0, \; \text{ for }\; i=p+1,\ldots ,q\},\) and \(A \bullet B\) denotes the Frobenius product of the two matrices A and B. If there exists some \(\xi _i\) such that \(\phi (\xi _i) -\lambda \bullet \varPsi (\xi _i) -\lambda _0>0\), then the value of (49) is \(-\infty\) because we can choose \(P=\alpha \delta _{\xi _i}(\cdot )\), where \(\delta _{\xi _i}(\cdot )\) denotes the Dirac probability measure at \(\xi _i\), and drive \(\alpha\) to \(+\infty\). Thus we are left to consider that case with

$$\begin{aligned} \phi (\xi ) -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0, \xi \in {\tilde{\varXi }}. \end{aligned}$$

Consequently we can rewrite (49) as

$$\begin{aligned}&\inf _{P\in \{P\in \mathscr {P}(\tilde{\varXi })_+: {\mathbb {E}}_P[\varPsi _E] = \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle \\&\quad = \sup _{\lambda \in \varLambda ,\lambda _0} \inf _{P\in \mathscr {M}_+(\tilde{\varXi }), \xi \in \varXi _N} \langle Q-P, \phi -\lambda \bullet \varPsi -\lambda _0\rangle + \langle Q, \lambda \bullet \varPsi +\lambda _0\rangle -\lambda \bullet \mu -\lambda _0\\&\quad = \sup _{\lambda \in \varLambda ,\lambda _0}\langle Q, \phi -\lambda \bullet \varPsi -\lambda _0\rangle + \langle Q, \lambda \bullet \varPsi +\lambda _0\rangle -\lambda \bullet \mu -\lambda _0\\&\quad =\sup _{\lambda \in \varLambda ,\lambda _0} \langle Q, \phi \rangle -\lambda \bullet \mu -\lambda _0. \end{aligned}$$

The second inequality is due to the fact that the optimum is attained at \(P=0\). Summarizing the discussions above, we arrive at

$$\begin{aligned} \mathsf {dl}_{TV}(Q,{\tilde{{{\mathcal {P}}}}})= & {} \left\{ \begin{array}{cl} \displaystyle \sup _{\Vert \phi \Vert ^*\le 1} \sup _{\lambda \in \varLambda ,\lambda _0} &{} \langle Q, \phi \rangle -\lambda \bullet \mu -\lambda _0 \\ \text{ s.t. } &{}\phi (\xi ) -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0, \;\;\xi \in {\tilde{\varXi }} \end{array}\right. \nonumber \\= & {} \left\{ \begin{array}{cl} \displaystyle \sup _{\lambda \in \varLambda ,\lambda _0}\sup _{\Vert \phi \Vert ^*\le 1} &{} \langle Q, \phi \rangle -\lambda \bullet \mu -\lambda _0 \\ \text{ s.t. } &{}\phi (\xi ) -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0, \;\;\xi \in {\tilde{\varXi }} \end{array}\right. \nonumber \\= & {} \left\{ \begin{array}{cl} \displaystyle \sup _{\lambda \in \varLambda ,\lambda _0} &{} \langle Q, \min \{\lambda \bullet \varPsi (\xi )+\lambda _0, 1\}\rangle -\lambda \bullet \mu -\lambda _0 \\ \text{ s.t. } &{} -1 -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\;\text{ a.e. } \; \; \xi \in {\tilde{\varXi }}. \end{array}\right. \end{aligned}$$
(50)

The first equality is obtained by swapping the two “sup” operations because their optimal values are bounded. To see how the second equality holds, we may compare the optimal value of the second and third programs above. Let’s denote the second one by (P) and the third one by (P’). Observe that (P’) is transferred from (P) by replacing \(\phi (\xi )\) in the objective by the largest possible value \(\min \{\lambda \bullet \varPsi (\xi )+\lambda _0, 1\}\) and in the constraint the smallest possible value \(-1\) making the feasible set largest. This means the optimal value of (P) is less or equal to that of (P’). On the other hand, for any optimal solution \((\lambda ^*, \lambda _0^*)\) of (P’), let \(\phi ^*:=\min \{\lambda ^*\bullet \varPsi (\xi )+\lambda _0^*, 1\}\). Then \((\lambda ^*, \lambda _0^*, \phi ^*)\) is a feasible solution of (P) which implies in turn the optimal value of (P’) is less or equal to that of (P). Note also that the optimal value is \(\mathsf {dl}_{TV}(Q,{\tilde{{{\mathcal {P}}}}})\in [0,2]\).

Step 2. We show that the optimization problem at the right hand side of (50) has a bounded optimal solution. Let

$$\begin{aligned} {{\mathcal {F}}}:=\{(\lambda ,\lambda _0) \in \varLambda \times \mathbb {R}: -1 -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\; \forall \xi \in {\tilde{\varXi }}\} \end{aligned}$$

denote the feasible set of (50) and

$$\begin{aligned} {{\mathcal {C}}}:=\{(\lambda ,\lambda _0) \in \varLambda \times \mathbb {R}: \Vert \lambda \Vert +|\lambda _0|=1, -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\; \forall \xi \in {\tilde{\varXi }}\}. \end{aligned}$$
(51)

Then

$$\begin{aligned} {{\mathcal {F}}}=\{(0_q,-1)+t{{\mathcal {C}}}: t\in [0_q,+\infty )\}. \end{aligned}$$

To see this, let \((\lambda , \lambda _0)\in {{\mathcal {F}}}\) and \(t=\Vert \lambda \Vert +|\lambda _0+1|\). If \(t=0\), then \((0_q,-1)\in \{(0_q,-1)+t{{\mathcal {C}}}: t\in [0,+\infty )\}\). Consider the case that \(t\ne 0\). Then it is easy to verify that \((\lambda , \lambda _0+1)/t\in {{\mathcal {C}}}\) and hence \((\lambda , \lambda _0)\in (0_q,-1)+t{{\mathcal {C}}}\). This shows \({{\mathcal {F}}}\subset \{(0_q,-1)+t{{\mathcal {C}}}: t\in [0,+\infty )\}\). The converse inclusion is obvious.

Note that \({{\mathcal {C}}}\ne \emptyset\). Otherwise we would have

$$\begin{aligned}&{\left\{ \begin{array}{cl} \displaystyle {\sup _{\lambda \in \varLambda ,\lambda _0}} &{} \langle Q, \min \{\lambda \bullet \varPsi +\lambda _0, 1\}\rangle -\lambda \bullet \mu -\lambda _0\\ \text{ s.t. } &{} -1 -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\;\text{ a.e. } \; \; \xi \in {\tilde{\varXi }} \end{array}\right. }\\&\quad \le \sup _{(\lambda , \lambda _0)= (0,-1)} \langle Q, \lambda \bullet \varPsi \rangle -\lambda \bullet \mu =0, \end{aligned}$$

which implies \(P\in \tilde{{{\mathcal {P}}}}\), a contradiction. In what follows, we consider the case when \({{\mathcal {C}}}\ne \emptyset\). From (51) we immediately have

$$\begin{aligned} -\lambda \bullet \langle P, \varPsi \rangle -\lambda _0 \langle P, 1\rangle \le 0, \forall (\lambda ,\lambda _0)\in {{\mathcal {C}}}, P\in \mathscr {M}_+(\tilde{\varXi }). \end{aligned}$$
(52)

On the other hand, by Assumption 1 and (17),

$$\begin{aligned} (0,0_{q}) \in \text{ int } \; [ (\langle P, 1 \rangle -1, \langle P, \varPsi (\xi )\rangle -\mu , -\{0\}\times \{0_p\} \times {{\mathcal {K}}}_-^{q-p}: P\in \mathscr {M}_+(\tilde{\varXi })] \end{aligned}$$

and there exists a closed neighborhood of \((0,0_{q})\) with radius \(\epsilon _0\), denoted by \(\epsilon _0{{\mathcal {B}}}\), such that

$$\begin{aligned} \epsilon _0{{\mathcal {B}}}\subset \text{ int } \; [ (\langle P, 1 \rangle -1, \langle P, \varPsi (\xi )\rangle -\mu ) - \{0\}\times \{0_p\} \times \mathcal{K}_-^{q-p}: P\in \mathscr {M}_+(\tilde{\varXi })]. \end{aligned}$$

Let \(({\tilde{\lambda }}, {\tilde{\lambda }}_0)\in {{\mathcal {C}}}\). Then there exists \(\tilde{w}\in \epsilon _0{{\mathcal {B}}}\) (depending on \(({\tilde{\lambda }}, {\tilde{\lambda }}_0)\)) such that \(\langle {\tilde{w}}, (\tilde{\lambda },{\tilde{\lambda }}_0)\rangle \le -\epsilon _0\). In other words, there exist \(\tilde{P}\in \mathscr {M}_+(\tilde{\varXi })\) and \(\eta \in {{\mathcal {K}}}_-^{q-p}\) such that

$$\begin{aligned} {\tilde{\lambda }}_0 (\langle \tilde{P}, 1 \rangle -1)+ {\tilde{\lambda }} \bullet (\langle \tilde{P}, \varPsi (\xi )\rangle -\mu ) -\eta \bullet {\tilde{\lambda }}_I \le -\epsilon _0. \end{aligned}$$
(53)

Since \(-\eta \bullet {\tilde{\lambda }}_I\ge 0\), we deduce from (52) and (53) that \(- \tilde{\lambda }_0- {\tilde{\lambda }} \bullet \mu \le -\epsilon _0.\) The inequality holds for every \(({\tilde{\lambda }}, {\tilde{\lambda }}_0)\in {{\mathcal {C}}}\). Note that for any \((\lambda ,\lambda _0)\in {{\mathcal {F}}}\), we may write it in the form

$$\begin{aligned} (\lambda ,\lambda _0) = ({\hat{\lambda }},{\hat{\lambda }}_0) + t(\tilde{\lambda },\tilde{\lambda }_0), \end{aligned}$$

where \(({\hat{\lambda }},{\hat{\lambda }}_0) =(0_q, -1)\), \((\tilde{\lambda },\tilde{\lambda }_0)\in {{\mathcal {C}}}\) and \(t\ge 0\). Observe that \(\langle Q, \min \{\lambda \bullet \varPsi +\lambda _0, 1\}\rangle \in [-1,1]\) and

$$\begin{aligned} -\lambda \bullet \mu -\lambda _0 = -\mu \bullet {\hat{\lambda }} -{\hat{\lambda }}_0 - t(\mu \bullet \tilde{\lambda }+\tilde{\lambda }_0) \le -\mu \bullet {\hat{\lambda }} -{\hat{\lambda }}_0 - t \epsilon _0=1-t \epsilon _0. \end{aligned}$$

Note that the optimal value of the optimization problem at the right hand side of (50) is positive, which implies for any optimal solution \((\lambda ^*, \lambda _0^*) = (0_q, -1) + (\bar{\lambda }^*, \bar{\lambda }_0^*)\) with \((\bar{\lambda }^*, \bar{\lambda }_0^*)\in t^*{{\mathcal {C}}}\) we must have \(1-\mu \bullet 0_q -(-1) - t^* \epsilon _0 = 2-t^*\epsilon _0>0\) or equivalently \(t^*< \frac{2}{\epsilon _0}.\) Let \(t_1= \frac{2}{\epsilon _0}\) and \({{\mathcal {F}}}_1:={{\mathcal {F}}}_0 + \{t{{\mathcal {C}}}: t_1\ge t\ge 0\}\). Based on the discussions above, we conclude that the optimization problem at the right hand side of (50) has an optimal solution in \({{\mathcal {F}}}_1\).

Step 3. Let \(C_1:=\max _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1}\Vert \lambda \Vert\). Then

$$\begin{aligned} d_{TV}(Q,{{\mathcal {P}}})= & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \langle Q, \min \{\lambda \bullet \varPsi (\xi (\omega ))+\lambda _0, 1\}\rangle -\lambda \bullet \mu -\lambda _0\\\le & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \langle Q, \lambda \bullet \varPsi (\xi (\omega ))+\lambda _0\rangle -\lambda \bullet \mu -\lambda _0\\= & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1}{\lambda }\bullet ({\mathbb {E}}_Q[\varPsi (\xi )]-\mu )\\\le & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \; \sum _{i=1}^p \lambda _i({\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i) +\sum _{i=p+1}^q \lambda _i({\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i)\\\le & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \; \sum _{i=1}^p |\lambda _i||{\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i| +\sum _{i=p+1}^q \lambda _i({\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i)_+\\\le & {} C_1 ( \Vert ({\mathbb {E}}_Q[\varPsi _E(\xi )]-\mu _E)\Vert + \Vert ({\mathbb {E}}_Q[\varPsi _I(\xi )]-\mu _I)_+\Vert ), \end{aligned}$$

where \(C_1=\frac{2}{\epsilon _0}+1\). The inequality also holds for the case when \({{\mathcal {C}}}=\emptyset\) because \({{\mathcal {F}}}_0\subset {{\mathcal {F}}}_1\). Note that the above result holds for all \({\tilde{{{\mathcal {P}}}}}\) when \(\bar{\varXi }\subset {\tilde{\varXi }}\subset \varXi\), which include both continuous and discrete support set case, the proof is complete. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Sun, H. & Xu, H. Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems. Comput Optim Appl 78, 205–238 (2021). https://doi.org/10.1007/s10589-020-00234-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-020-00234-7

Keywords

Navigation