Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems

Chen, Yannan; Sun, Hailin; Xu, Huifu

doi:10.1007/s10589-020-00234-7

Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems

Published: 04 November 2020

Volume 78, pages 205–238, (2021)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

1331 Accesses
10 Citations
Explore all metrics

Abstract

Decomposition methods have been well studied for solving two-stage and multi-stage stochastic programming problems, see Rockafellar and Wets (Math. Oper. Res. 16:119–147, 1991), Ruszczyński and Shapiro (Stochastic Programming, Handbook in OR & MS, North-Holland Publishing Company, Amsterdam, 2003) and Ruszczyński (Math. Program. 79:333–353, 1997). In this paper, we propose an algorithmic framework based on the fundamental ideas of the methods for solving two-stage minimax distributionally robust optimization (DRO) problems where the underlying random variables take a finite number of distinct values. This is achieved by introducing nonanticipativity constraints for the first stage decision variables, rearranging the minimax problem through Lagrange decomposition and applying the well-known primal-dual hybrid gradient (PDHG) method to the new minimax problem. The algorithmic framework does not depend on specific structure of the ambiguity set. To extend the algorithm to the case that the underlying random variables are continuously distributed, we propose a discretization scheme and quantify the error arising from the discretization in terms of the optimal value and the optimal solutions when the ambiguity set is constructed through generalized prior moment conditions, the Kantorovich ball and $\phi$-divergence centred at an empirical probability distribution. Some preliminary numerical tests show the proposed decomposition algorithm featured with parallel computing performs well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributionally robust optimization with decision dependent ambiguity sets

Article Open access 08 April 2020

Fengqiao Luo & Sanjay Mehrotra

Distributionally robust optimization with matrix moment constraints: Lagrange duality and cutting plane methods

Article Open access 12 April 2017

Huifu Xu, Yongchao Liu & Hailin Sun

Data-driven Stochastic Programming with Distributionally Robust Constraints Under Wasserstein Distance: Asymptotic Properties

Article 05 August 2020

Yu Mei, Zhi-Ping Chen, … Jia Liu

Notes

In some literature, total variation metric is defined as $\mathsf {dl}_{TV}=\sup _{B\in \mathscr {B}}|P(B) - Q(B)|$, see [9].

References

Athreya, K.B., Lahiri, S.N.: Measure Theory and Probability Theory. Springer, New York (2006)
MATH Google Scholar
Bertsimas, D., Doan, X.V., Natarajan, K., Teo, C.P.: Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res. 35, 580–602 (2010)
Article MathSciNet Google Scholar
Bertsimas, D., Parys, B. V.: Bootstrap robust prescriptive analytics. arXiv:1711.09974 (2017)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging. Vis. 40, 120–145 (2011)
Article MathSciNet Google Scholar
Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58, 592–612 (2010)
MathSciNet MATH Google Scholar
Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal-dual algorithms for Convex Optimization in Imaging Science. SIAM J. Imaging Sci. 3, 1015–1046 (2010)
Article MathSciNet Google Scholar
Fan, K.: Minimax theorems. Izv. Nats. Akad. Nauk Armen. Mekh 39, 42–47 (1953)
MathSciNet MATH Google Scholar
Gao, R., Kleywegt, A.: Distributionally robust stochastic optimization with Wasserstein distance. arXiv:1604.02199 (2016)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70, 419–435 (2002)
Article Google Scholar
Goh, J., Sim, M.: Distributionally robust optimization and its tractable approximations. Oper. Res. 58, 902–917 (2010)
Article MathSciNet Google Scholar
Goldstein, T., Li, M., Yuan, X., Esser, E., Baraniuk, R.: Adaptive primal-dual hybrid gradient methods for saddle-point problems. arXiv:1305.0546 (2013)
Guo, S., Xu, H., Zhang, L.: Convergence analysis for mathematical programs with distributionally robust chance constraint. SIAM J. Optim. 27, 784–816 (2017)
Article MathSciNet Google Scholar
Guo, S., Xu, H.: Distributionally robust shortfall risk optimization model and its approximation. Math. Program. 174, 473–498 (2019)
Article MathSciNet Google Scholar
Hanasusanto, G.A., Kuhn, D.: Conic programming reformulations of two-stage distributionally robust linear programs over Wasserstein balls. Oper. Res. 66, 849–869 (2018)
Article MathSciNet Google Scholar
He, B., Ma, F., Yuan, X.: An algorithm framework of generalized primal-dual hybrid gradient methods for saddle point problems. J. Math. Imaging. Vis. 58, 279–293 (2017)
Article Google Scholar
He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5, 119–149 (2012)
Article MathSciNet Google Scholar
Jiang, R., Guan, Y.: Risk-averse two-stage stochastic program with distributional ambiguity. Oper. Res. 66, 1390–1405 (2018)
Article MathSciNet Google Scholar
Liu, Y., Pichler, A., Xu, H.: Discrete approximation and quantification in distributionally robust optimization. Math. Oper. Res. 44, 19–37 (2019)
Article MathSciNet Google Scholar
Liu, Y., Yuan, X., Zeng, S., Zhang, J.: Primal-dual hybrid gradient method for distributionally robust optimization problems. Oper. Res. Lett. 45, 625–630 (2017)
Article MathSciNet Google Scholar
Liu, Y., Yuan, X., Zhang, J.: Quantitative stability analysis of stochastic programs with distributionally robust second order dominance constraints, manuscript (2017)
Love, D., Bayrakcan, G.: Phi-divergence constrained ambiguous stochastic programs for data-driven optimization, available on researchgate.net (2016)
Mohajerin Esfahani, P., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171, 115–166 (2018)
Article MathSciNet Google Scholar
Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC, Boca Raton (2005)
MATH Google Scholar
Pflug, G.C., Pichler, A.: Approximations for probability distributions and stochastic optimization problems. In: Bertocchi, M., Consigli, G., Dempster, M.A.H. (eds.) Stochastic Optimization Methods in Finance and Energy, vol. 163 of International Series in Operations Research & Management Science. Springer, New York (2011)
Google Scholar
Pflug, G.C., Pichler, A.: Multistage Stochastic Optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York (2014)
Google Scholar
Pflug, G.C., Wozabal, D.: Ambiguity in portfolio selection. Quant. Financ. 7, 435–442 (2007)
Article MathSciNet Google Scholar
Pichler, A., Xu, H.: Quantitative stability analysis for minimax distributionally robust risk optimization, To appear in Math. Program. (2018)
Rachev, S.T.: Probability Metrics and the Stability of Stochastic Models. Wiley, West Sussex (1991)
MATH Google Scholar
Rockafellar, R., Wets, R.J.-B.: Scenarios and policy aggregation in optimization under uncertainty. Math. Oper. Res. 16, 119–147 (1991)
Article MathSciNet Google Scholar
Rockafellar, R., Sun, J.: Solving monotone stochastic variational inequalities and complementarity problems by progressive hedging. Math. Program. 174, 453–471 (2019)
Article MathSciNet Google Scholar
Römisch, W.: Stability of stochastic programming problems. In: Rusczyński, A., Shapiro, A. (eds.) Stochastic Programming. Handbook in OR & MS, vol. 10. North-Holland Publishing Company, Amsterdam (2003)
Google Scholar
Ruszczyński, A.: Decomposition method. In: Rusczyński, A., Shapiro, A. (eds.) Stochastic Programming, Handbook in OR & MS, vol. 10. North-Holland Publishing Company, Amsterdam (2003)
Google Scholar
Ruszczyński, A.: Decomposition methods in stochastic programming. Math. Program. 79, 333–353 (1997)
MathSciNet MATH Google Scholar
Rahimian, H., Bayraksan, G., Homem-de-Mello, T.: Identifying effective scenarios in distributionally robust stochastic programs with total variation distance. Math. Program. 173, 393–430 (2019)
Article MathSciNet Google Scholar
Rusczyński, A., Shapiro, A.: Stochastic Programming, Handbook in OR & MS, vol. 10. North-Holland Publishing Company, Amsterdam (2003)
Google Scholar
Scarf, H.: A min–max solution of an inventory problem. In: Arrow, K.S., Karlin, S., Scarf, H.E. (eds.) Studies in the Mathematical Theory of Inventory and Production, pp. 201–209. Stanford University Press, Palo Alto (1958)
Google Scholar
Shapiro, A., Ahmed, S.: On a class of minimax stochastic programs. SIAM J. Optim. 14, 1237–1249 (2004)
Article MathSciNet Google Scholar
Shapiro, A.: On duality theory of conic linear problems. In: Goberna, M.A., López, M.A. (eds.) Semi-Infinite Programming. Nonconvex Optimization and Its Applications, vol. 57. Springer, Boston (2001)
Google Scholar
Sun, J., Liao, L.Z., Rodrigues, B.: Quadratic two-stage stochastic optimization with coherent measures of risk. Math. Program. 168, 599–613 (2018)
Article MathSciNet Google Scholar
Sun, H., Xu, H.: Convergence analysis for distributionally robust optimization and equilibrium problems. Math. Oper. Res. 41, 377–401 (2016)
Article MathSciNet Google Scholar
Weiss, P., Blanc-Feraud, L., Aubert, G.: Efficient schemes for total variation minimization under constraints in image processing. SIAM J. Sci. Comput. 31, 2047–2080 (2009)
Article MathSciNet Google Scholar
Wiesemann, W., Kuhn, D., Sim, M.: Distributionally robust convex optimization. Oper. Res. 62, 1358–376 (2014)
Article MathSciNet Google Scholar
Xu, H., Liu, Y., Sun, H.: Distributionally robust optimization with matrix moment constraints: Lagrange duality and cutting plane methods. Math. Program. 169, 489–529 (2018)
Article MathSciNet Google Scholar
Zhang, Y., Jiang, R., Shen, S.: Ambiguous chance-constrained binary programs under mean-covariance information. SIAM J. Optim. 28, 2922–2944 (2018)
Article MathSciNet Google Scholar
Zhang, X., Burger, M., Osher, S.: A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46, 20–46 (2010)
Article MathSciNet Google Scholar
Zhao, C., Guan, Y.: Data-driven risk-averse two-stage stochastic program with ζ-structure probability metrics, available at Optimization Online (2015). http://www.optimization-online.org/DB_FILE/2015/07/5014.pdf
Zolotarev, V.M.: Probability metrics. Teoriya Veroyatnostei IEE Primeneniya 28, 264–287 (1983)
MathSciNet MATH Google Scholar
Zhang, Z., Ahmend, S., Lan, G.: Efficient algorithms for distributionally robust stochastic optimization with discrete scenario support. arXiv:1909.11216 (2019)
Zhu, M., Chan, T. F.: An efficient primal dual hybrid gradient algorithm for total variation image restoration. CAM Report 08-34, UCLA, Los Angeles, CA (2008)

Download references

Acknowledgements

We would like to thank Shabbir Ahmed for initiating the research in a private discussion with the 3rd author during the 14th international conference on stochastic programming in Búzios and his further encouragement during the preparation of the paper. We would also like to thank the two anonymous referees for insightful comments which help us significantly strengthen the presentation of the paper.

Funding

The funding was provided by National Natural Science Foundation of China (Grant No. 11871276 and 11771405).

Author information

Authors and Affiliations

School of Mathematical Sciences, South China Normal University, Guangzhou, 510631, China
Yannan Chen
Jiangsu Key Laboratory for NSLSCS, School of Mathematical Sciences, Nanjing Normal University, Nanjing, 210023, China
Hailin Sun
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Huifu Xu

Authors

Yannan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hailin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Huifu Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hailin Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Lemma 1

The inequality holds trivially if $Q\in {\tilde{{{\mathcal {P}}}}}$. So we only consider the case when $Q\not \in {\tilde{{{\mathcal {P}}}}}$. We proceed the proof in three steps.

Step 1. By the definition of the total variation norm (see [1]), $||P||= \sup _{\Vert \phi \Vert ^*\le 1} \langle P, \phi \rangle ,$ where $\Vert \cdot \Vert ^*$ is the dual of norm $\Vert \cdot \Vert$. Moreover, by the definition of the total variation metric

$$\begin{aligned} \mathsf {dl}_{TV}(Q,\tilde{{{\mathcal {P}}}})= & {} \inf _{P\in {\tilde{{{\mathcal {P}}}}}} \mathsf {dl}_{TV}(Q,P) \\= & {} \inf _{P\in \text{ cl } \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \sup _{\Vert \phi \Vert ^*\le 1} \langle Q-P, \phi \rangle \\= & {} \sup _{\Vert \phi \Vert ^*\le 1} \min _{P\in \text{ cl } \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle \\= & {} \sup _{\Vert \phi \Vert ^*\le 1} \inf _{P\in \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle , \end{aligned}$$

where $\text{ cl }(\cdot )$ denotes closure of a set under topology of weak convergence, $\langle P, \phi \rangle :=\int _{\tilde{\varXi }}\phi (\xi )P(d\xi )$ and the exchange is justified by [7, Theorem 2] under our assumption that $\tilde{{{\mathcal {P}}}}$ is weakly compact. Note that we write $\langle P, \phi \rangle$ for ${\mathbb {E}}_P[\phi ]$ in that later on we will relax P from a probability measure to a positive measure, and we will be able to see P clearly as a variable in the moment system and the moment system is linear in P. It is easy to observe that $\mathsf {dl}_{TV}(P,{{\mathcal {P}}})\le 2$. Moreover, under the Slater type condition (16), it follows by [38, Proposition 3.4] that

$$\begin{aligned}&\inf _{P\in \{P\in \mathscr {P}(\tilde{\varXi }): {\mathbb {E}}_P[\varPsi _E]= \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle \nonumber \\&\quad =\sup _{\lambda \in \varLambda ,\lambda _0} \inf _{P\in \mathscr {M}_+(\tilde{\varXi })} \langle Q-P, \phi \rangle + \lambda \bullet (\langle P,\varPsi \rangle -\mu ) + \lambda _0(\langle P,1\rangle -1)\nonumber \\&\quad =\sup _{\lambda \in \varLambda ,\lambda _0} \inf _{P\in \mathscr {M}_+(\tilde{\varXi })} \langle Q-P, \phi -\lambda \bullet \varPsi -\lambda _0\rangle \nonumber \\&\qquad + \langle Q, \lambda \bullet \varPsi +\lambda _0\rangle -\lambda \bullet \mu -\lambda _0, \end{aligned}$$

(49)

where $\mathscr {M}_+(\tilde{\varXi })$ denotes the set of all positive measures defined on $\tilde{\varXi }$, $\varLambda :=\{(\lambda _1,\cdots ,\lambda _q): \lambda _i\succeq 0, \; \text{ for }\; i=p+1,\ldots ,q\},$ and $A \bullet B$ denotes the Frobenius product of the two matrices A and B. If there exists some $\xi _i$ such that $\phi (\xi _i) -\lambda \bullet \varPsi (\xi _i) -\lambda _0>0$, then the value of (49) is $-\infty$ because we can choose $P=\alpha \delta _{\xi _i}(\cdot )$, where $\delta _{\xi _i}(\cdot )$ denotes the Dirac probability measure at $\xi _i$, and drive $\alpha$ to $+\infty$. Thus we are left to consider that case with

$$\begin{aligned} \phi (\xi ) -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0, \xi \in {\tilde{\varXi }}. \end{aligned}$$

Consequently we can rewrite (49) as

$$\begin{aligned}&\inf _{P\in \{P\in \mathscr {P}(\tilde{\varXi })_+: {\mathbb {E}}_P[\varPsi _E] = \mu _E, {\mathbb {E}}_P[\varPsi _I]\preceq \mu _I\}} \langle Q-P, \phi \rangle \\&\quad = \sup _{\lambda \in \varLambda ,\lambda _0} \inf _{P\in \mathscr {M}_+(\tilde{\varXi }), \xi \in \varXi _N} \langle Q-P, \phi -\lambda \bullet \varPsi -\lambda _0\rangle + \langle Q, \lambda \bullet \varPsi +\lambda _0\rangle -\lambda \bullet \mu -\lambda _0\\&\quad = \sup _{\lambda \in \varLambda ,\lambda _0}\langle Q, \phi -\lambda \bullet \varPsi -\lambda _0\rangle + \langle Q, \lambda \bullet \varPsi +\lambda _0\rangle -\lambda \bullet \mu -\lambda _0\\&\quad =\sup _{\lambda \in \varLambda ,\lambda _0} \langle Q, \phi \rangle -\lambda \bullet \mu -\lambda _0. \end{aligned}$$

The second inequality is due to the fact that the optimum is attained at $P=0$. Summarizing the discussions above, we arrive at

$$\begin{aligned} \mathsf {dl}_{TV}(Q,{\tilde{{{\mathcal {P}}}}})= & {} \left\{ \begin{array}{cl} \displaystyle \sup _{\Vert \phi \Vert ^*\le 1} \sup _{\lambda \in \varLambda ,\lambda _0} &{} \langle Q, \phi \rangle -\lambda \bullet \mu -\lambda _0 \\ \text{ s.t. } &{}\phi (\xi ) -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0, \;\;\xi \in {\tilde{\varXi }} \end{array}\right. \nonumber \\= & {} \left\{ \begin{array}{cl} \displaystyle \sup _{\lambda \in \varLambda ,\lambda _0}\sup _{\Vert \phi \Vert ^*\le 1} &{} \langle Q, \phi \rangle -\lambda \bullet \mu -\lambda _0 \\ \text{ s.t. } &{}\phi (\xi ) -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0, \;\;\xi \in {\tilde{\varXi }} \end{array}\right. \nonumber \\= & {} \left\{ \begin{array}{cl} \displaystyle \sup _{\lambda \in \varLambda ,\lambda _0} &{} \langle Q, \min \{\lambda \bullet \varPsi (\xi )+\lambda _0, 1\}\rangle -\lambda \bullet \mu -\lambda _0 \\ \text{ s.t. } &{} -1 -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\;\text{ a.e. } \; \; \xi \in {\tilde{\varXi }}. \end{array}\right. \end{aligned}$$

(50)

The first equality is obtained by swapping the two “sup” operations because their optimal values are bounded. To see how the second equality holds, we may compare the optimal value of the second and third programs above. Let’s denote the second one by (P) and the third one by (P’). Observe that (P’) is transferred from (P) by replacing $\phi (\xi )$ in the objective by the largest possible value $\min \{\lambda \bullet \varPsi (\xi )+\lambda _0, 1\}$ and in the constraint the smallest possible value $-1$ making the feasible set largest. This means the optimal value of (P) is less or equal to that of (P’). On the other hand, for any optimal solution $(\lambda ^*, \lambda _0^*)$ of (P’), let $\phi ^*:=\min \{\lambda ^*\bullet \varPsi (\xi )+\lambda _0^*, 1\}$. Then $(\lambda ^*, \lambda _0^*, \phi ^*)$ is a feasible solution of (P) which implies in turn the optimal value of (P’) is less or equal to that of (P). Note also that the optimal value is $\mathsf {dl}_{TV}(Q,{\tilde{{{\mathcal {P}}}}})\in [0,2]$.

Step 2. We show that the optimization problem at the right hand side of (50) has a bounded optimal solution. Let

$$\begin{aligned} {{\mathcal {F}}}:=\{(\lambda ,\lambda _0) \in \varLambda \times \mathbb {R}: -1 -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\; \forall \xi \in {\tilde{\varXi }}\} \end{aligned}$$

denote the feasible set of (50) and

$$\begin{aligned} {{\mathcal {C}}}:=\{(\lambda ,\lambda _0) \in \varLambda \times \mathbb {R}: \Vert \lambda \Vert +|\lambda _0|=1, -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\; \forall \xi \in {\tilde{\varXi }}\}. \end{aligned}$$

(51)

Then

$$\begin{aligned} {{\mathcal {F}}}=\{(0_q,-1)+t{{\mathcal {C}}}: t\in [0_q,+\infty )\}. \end{aligned}$$

To see this, let $(\lambda , \lambda _0)\in {{\mathcal {F}}}$ and $t=\Vert \lambda \Vert +|\lambda _0+1|$. If $t=0$, then $(0_q,-1)\in \{(0_q,-1)+t{{\mathcal {C}}}: t\in [0,+\infty )\}$. Consider the case that $t\ne 0$. Then it is easy to verify that $(\lambda , \lambda _0+1)/t\in {{\mathcal {C}}}$ and hence $(\lambda , \lambda _0)\in (0_q,-1)+t{{\mathcal {C}}}$. This shows ${{\mathcal {F}}}\subset \{(0_q,-1)+t{{\mathcal {C}}}: t\in [0,+\infty )\}$. The converse inclusion is obvious.

Note that ${{\mathcal {C}}}\ne \emptyset$. Otherwise we would have

$$\begin{aligned}&{\left\{ \begin{array}{cl} \displaystyle {\sup _{\lambda \in \varLambda ,\lambda _0}} &{} \langle Q, \min \{\lambda \bullet \varPsi +\lambda _0, 1\}\rangle -\lambda \bullet \mu -\lambda _0\\ \text{ s.t. } &{} -1 -\lambda \bullet \varPsi (\xi ) -\lambda _0 \le 0,\;\;\text{ a.e. } \; \; \xi \in {\tilde{\varXi }} \end{array}\right. }\\&\quad \le \sup _{(\lambda , \lambda _0)= (0,-1)} \langle Q, \lambda \bullet \varPsi \rangle -\lambda \bullet \mu =0, \end{aligned}$$

which implies $P\in \tilde{{{\mathcal {P}}}}$, a contradiction. In what follows, we consider the case when ${{\mathcal {C}}}\ne \emptyset$. From (51) we immediately have

$$\begin{aligned} -\lambda \bullet \langle P, \varPsi \rangle -\lambda _0 \langle P, 1\rangle \le 0, \forall (\lambda ,\lambda _0)\in {{\mathcal {C}}}, P\in \mathscr {M}_+(\tilde{\varXi }). \end{aligned}$$

(52)

On the other hand, by Assumption 1 and (17),

$$\begin{aligned} (0,0_{q}) \in \text{ int } \; [ (\langle P, 1 \rangle -1, \langle P, \varPsi (\xi )\rangle -\mu , -\{0\}\times \{0_p\} \times {{\mathcal {K}}}_-^{q-p}: P\in \mathscr {M}_+(\tilde{\varXi })] \end{aligned}$$

and there exists a closed neighborhood of $(0,0_{q})$ with radius $\epsilon _0$, denoted by $\epsilon _0{{\mathcal {B}}}$, such that

$$\begin{aligned} \epsilon _0{{\mathcal {B}}}\subset \text{ int } \; [ (\langle P, 1 \rangle -1, \langle P, \varPsi (\xi )\rangle -\mu ) - \{0\}\times \{0_p\} \times \mathcal{K}_-^{q-p}: P\in \mathscr {M}_+(\tilde{\varXi })]. \end{aligned}$$

Let $({\tilde{\lambda }}, {\tilde{\lambda }}_0)\in {{\mathcal {C}}}$. Then there exists $\tilde{w}\in \epsilon _0{{\mathcal {B}}}$ (depending on $({\tilde{\lambda }}, {\tilde{\lambda }}_0)$) such that $\langle {\tilde{w}}, (\tilde{\lambda },{\tilde{\lambda }}_0)\rangle \le -\epsilon _0$. In other words, there exist $\tilde{P}\in \mathscr {M}_+(\tilde{\varXi })$ and $\eta \in {{\mathcal {K}}}_-^{q-p}$ such that

$$\begin{aligned} {\tilde{\lambda }}_0 (\langle \tilde{P}, 1 \rangle -1)+ {\tilde{\lambda }} \bullet (\langle \tilde{P}, \varPsi (\xi )\rangle -\mu ) -\eta \bullet {\tilde{\lambda }}_I \le -\epsilon _0. \end{aligned}$$

(53)

Since $-\eta \bullet {\tilde{\lambda }}_I\ge 0$, we deduce from (52) and (53) that $- \tilde{\lambda }_0- {\tilde{\lambda }} \bullet \mu \le -\epsilon _0.$ The inequality holds for every $({\tilde{\lambda }}, {\tilde{\lambda }}_0)\in {{\mathcal {C}}}$. Note that for any $(\lambda ,\lambda _0)\in {{\mathcal {F}}}$, we may write it in the form

$$\begin{aligned} (\lambda ,\lambda _0) = ({\hat{\lambda }},{\hat{\lambda }}_0) + t(\tilde{\lambda },\tilde{\lambda }_0), \end{aligned}$$

where $({\hat{\lambda }},{\hat{\lambda }}_0) =(0_q, -1)$, $(\tilde{\lambda },\tilde{\lambda }_0)\in {{\mathcal {C}}}$ and $t\ge 0$. Observe that $\langle Q, \min \{\lambda \bullet \varPsi +\lambda _0, 1\}\rangle \in [-1,1]$ and

$$\begin{aligned} -\lambda \bullet \mu -\lambda _0 = -\mu \bullet {\hat{\lambda }} -{\hat{\lambda }}_0 - t(\mu \bullet \tilde{\lambda }+\tilde{\lambda }_0) \le -\mu \bullet {\hat{\lambda }} -{\hat{\lambda }}_0 - t \epsilon _0=1-t \epsilon _0. \end{aligned}$$

Note that the optimal value of the optimization problem at the right hand side of (50) is positive, which implies for any optimal solution $(\lambda ^*, \lambda _0^*) = (0_q, -1) + (\bar{\lambda }^*, \bar{\lambda }_0^*)$ with $(\bar{\lambda }^*, \bar{\lambda }_0^*)\in t^*{{\mathcal {C}}}$ we must have $1-\mu \bullet 0_q -(-1) - t^* \epsilon _0 = 2-t^*\epsilon _0>0$ or equivalently $t^*< \frac{2}{\epsilon _0}.$ Let $t_1= \frac{2}{\epsilon _0}$ and ${{\mathcal {F}}}_1:={{\mathcal {F}}}_0 + \{t{{\mathcal {C}}}: t_1\ge t\ge 0\}$. Based on the discussions above, we conclude that the optimization problem at the right hand side of (50) has an optimal solution in ${{\mathcal {F}}}_1$.

Step 3. Let $C_1:=\max _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1}\Vert \lambda \Vert$. Then

$$\begin{aligned} d_{TV}(Q,{{\mathcal {P}}})= & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \langle Q, \min \{\lambda \bullet \varPsi (\xi (\omega ))+\lambda _0, 1\}\rangle -\lambda \bullet \mu -\lambda _0\\\le & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \langle Q, \lambda \bullet \varPsi (\xi (\omega ))+\lambda _0\rangle -\lambda \bullet \mu -\lambda _0\\= & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1}{\lambda }\bullet ({\mathbb {E}}_Q[\varPsi (\xi )]-\mu )\\\le & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \; \sum _{i=1}^p \lambda _i({\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i) +\sum _{i=p+1}^q \lambda _i({\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i)\\\le & {} \sup _{(\lambda , \lambda _0)\in {{\mathcal {F}}}_1} \; \sum _{i=1}^p |\lambda _i||{\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i| +\sum _{i=p+1}^q \lambda _i({\mathbb {E}}_Q[\varPsi _i(\xi )]-\mu _i)_+\\\le & {} C_1 ( \Vert ({\mathbb {E}}_Q[\varPsi _E(\xi )]-\mu _E)\Vert + \Vert ({\mathbb {E}}_Q[\varPsi _I(\xi )]-\mu _I)_+\Vert ), \end{aligned}$$

where $C_1=\frac{2}{\epsilon _0}+1$. The inequality also holds for the case when ${{\mathcal {C}}}=\emptyset$ because ${{\mathcal {F}}}_0\subset {{\mathcal {F}}}_1$. Note that the above result holds for all ${\tilde{{{\mathcal {P}}}}}$ when $\bar{\varXi }\subset {\tilde{\varXi }}\subset \varXi$, which include both continuous and discrete support set case, the proof is complete. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Sun, H. & Xu, H. Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems. Comput Optim Appl 78, 205–238 (2021). https://doi.org/10.1007/s10589-020-00234-7

Download citation

Received: 29 November 2019
Accepted: 08 October 2020
Published: 04 November 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10589-020-00234-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems

Abstract

Access this article

Similar content being viewed by others

Distributionally robust optimization with decision dependent ambiguity sets

Distributionally robust optimization with matrix moment constraints: Lagrange duality and cutting plane methods

Data-driven Stochastic Programming with Distributionally Robust Constraints Under Wasserstein Distance: Asymptotic Properties

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Lemma 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decomposition and discrete approximation methods for solving two-stage distributionally robust optimization problems

Abstract

Access this article

Similar content being viewed by others

Distributionally robust optimization with decision dependent ambiguity sets

Distributionally robust optimization with matrix moment constraints: Lagrange duality and cutting plane methods

Data-driven Stochastic Programming with Distributionally Robust Constraints Under Wasserstein Distance: Asymptotic Properties

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation