Abstract
We study chance-constrained problems in which the constraints involve the probability of a rare event. We discuss the relevance of such problems and show that the existing sampling-based algorithms cannot be applied directly in this case, since they require an impractical number of samples to yield reasonable solutions. We argue that importance sampling (IS) techniques, combined with a Sample Average Approximation (SAA) approach, can be effectively used in such situations, provided that variance can be reduced uniformly with respect to the decision variables. We give sufficient conditions to obtain such uniform variance reduction, and prove asymptotic convergence of the combined SAA-IS approach. As it often happens with IS techniques, the practical performance of the proposed approach relies on exploiting the structure of the problem under study; in our case, we work with a telecommunications problem with Bernoulli input distributions, and show how variance can be reduced uniformly over a suitable approximation of the feasibility set by choosing proper parameters for the IS distributions. Although some of the results are specific to this problem, we are able to draw general insights that can be useful for other classes of problems. We present numerical results to illustrate our findings.



Similar content being viewed by others
Notes
Random lower semicontinuous functions are called normal integrands in [36].
References
Adas, A.: Traffic models in broadband networks. IEEE Commun. Mag. 35(7), 82–89 (1997)
Andrieu, L., Henrion, R., Römisch, W.: A model for dynamic chance constraints in hydro power reservoir management. Eur. J. Oper. Res. 207(2), 579–589 (2010)
Artstein, Z., Wets, R.J.B.: Consistency of minimizers and the SLLN for stochastic programs. J. Convex Anal. 2(1–2), 1–17 (1996)
Asmussen, S., Glynn, P.: Stochastic Simulation. Springer, New York (2007)
Beraldi, P., Ruszczyński, A.: The probabilistic set-covering problem. Oper. Res. 50(6), 956–967 (2002)
Bonami, P., Lejeune, M.: An exact solution approach for portfolio optimization problems under stochastic and integer constraints. Oper. Res. 57(3), 650–670 (2009)
Calafiore, G., Campi, M.C.: Uncertain convex programs: randomized solutions and confidence levels. Math. Program. 102(1), 25–46 (2005)
Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008)
Campi, M.C., Garatti, S.: A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. J. Optim. Theory Appl. 148(2), 257–280 (2011)
Campi, M.C., Garatti, S., Prandini, M.: The scenario approach for systems and control design. Ann. Rev. Control 33(2), 149–157 (2009)
Carniato, A., Camponogara, E.: Integrated coal-mining operations planning: modeling and case study. Int. J. Coal Prep. Util. 31(6), 299–334 (2011)
Charnes, A., Cooper, W.W., Symonds, G.H.: Cost horizons and certainty equivalents: an approach to stochastic programming of heating oil. Manag. Sci. 4, 235–263 (1958)
Chung, K.L.: A Course in Probability Theory, 2nd edn. Academic Press, New York (1974)
Dantzig, G.B., Glynn, P.W.: Parallel processors for planning under uncertainty. Ann. Oper. Res. 22(1), 1–21 (1990)
Dentcheva, D., Prékopa, A., Ruszczynski, A.: Concavity and efficient points of discrete distributions in probabilistic programming. Math. Program. 89(1), 55–77 (2000)
Dorfleitner, G., Utz, S.: Safety first portfolio choice based on financial and sustainability returns. Eur. J. Oper. Res. 221(1), 155–164 (2012)
Duckett, W.: Risk analysis and the acceptable probability of failure. Struct. Eng. 83(15), 25–26 (2005)
Ermoliev, Y.M., Ermolieva, T.Y., MacDonald, G., Norkin, V.: Stochastic optimization of insurance portfolios for managing exposure to catastrophic risks. Ann. Oper. Res. 99(1–4), 207–225 (2000)
Henrion, R., Römisch, W.: Metric regularity and quantitative stability in stochastic programs with probabilistic constraints. Math. Program. 84(1), 55–88 (1999)
Homem-de-Mello, T., Bayraksan, G.: Monte Carlo methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19(1), 56–85 (2014)
Infanger, G.: Monte Carlo (importance) sampling within a Benders decomposition algorithm for stochastic linear programs. Ann. Oper. Res. 39(1), 69–95 (1992)
Jiang, R., Guan, Y.: Data-driven chance constrained stochastic program (2012). http://www.optimization-online.org
Kahn, H., Harris, T.: Estimation of particle transmission by random sampling. Nat. Bur. Stand. Appl. Math. Ser. 12, 27–30 (1951)
L’Ecuyer, P., Mandjes, M., Tuffin, B.: Importance sampling in rare event simulation. In: Rubino, G., Tuffin, B., (eds.) Rare Event Simulation using Monte Carlo Methods, Chap. 2. Wiley, New York (2009)
Lejeune, M.: Pattern definition of the p-efficiency concept. Ann. Oper. Res. 200(1), 23–36 (2012)
Li, W.L., Zhang, Y., So, A.C., Win, Z.: Slow adaptive OFDMA systems through chance constrained programming. IEEE Trans. Signal Process. 58(7), 3858–3869 (2010)
Liu, Y., Guo, H., Zhou, F., Qin, X., Huang, K., Yu, Y.: Inexact chance-constrained linear programming model for optimal water pollution management at the watershed scale. J. Water Resour. Plan. Manag. 134(4), 347–356 (2008)
Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19(2), 674–699 (2008)
Minoux, M.: Discrete cost multicommodity network optimization problems and exact solution methods. Ann. Oper. Res. 106(1–4), 19–46 (2001)
Minoux, M.: Multicommodity network flow models and algorithms in telecommunications. In: Resende, M., Pardalos, P. (eds.) Handbook of Optimization in Telecommunications, pp. 163–184. Springer, Berlin (2006)
Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2006)
Pagnoncelli, B., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: theory and applications. J. Optim. Theory Appl. 142(2), 399–416 (2009)
Pagnoncelli, B.K., Reich, D., Campi, M.C.: Risk-return trade-off with the scenario approach in practice: a case study in portfolio selection. J. Optim. Theory Appl. 155(2), 707–722 (2012)
Prékopa, A.: Probabilistic programming. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming, vol. 10, pp. 267–351. Elsevier, Amsterdam (2004)
Ramaswami, R., Sivarajan, K., Sasaki, G.: Optical Networks: A Practical Perspective. Morgan Kaufmann, Los Altos (2009)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, A Series of Comprehensive Studies in Mathematics, vol. 317. Springer, Berlin (1998)
Römisch, W., Schultz, R.: Stability analysis for stochastic programs. Ann. Oper. Res. 30(1), 241–266 (1991)
Rosenbluth, M.N., Rosenbluth, A.W.: Monte Carlo calculation of the average extension of molecular chains. J. Chem. Phys. 23, 356 (1955)
Rubinstein, R.Y.: Cross-entropy and rare events for maximal cut and partition problems. ACM Trans. Model. Comput. Simul. 12(1), 27–53 (2002)
Rubinstein, R.Y., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. Wiley, Chichester (1993)
Shapiro, A.: Monte Carlo sampling methods. In: Ruszczynski, A., Shapiro, A. (eds.) Stochastic Programming, Handbooks in Operations Research and Management Science, vol. 10. Elsevier, Amsterdam (2003)
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on stochastic programming: modeling and theory, vol. 9. SIAM (2009)
Soekkha, H.M.: Aviation Safety: Human Factors, System Engineering, Flight Operations, Economics, Strategies, Management. VSP, Utrecht (1997)
Thieu, Q.T., Hsieh, H.Y.: Use of chance-constrained programming for solving the opportunistic spectrum sharing problem under rayleigh fading. In: 9th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1792–1797 (2013)
Tran, Q.K., Parpas, P., Rustem, B., Ustun, B., Webster, M.: Importance sampling in stochastic programming: a Markov chain Monte Carlo approach (2013). http://www.optimization-online.org
Vallejos, R., Zapata-Beghelli, A., Albornoz, V., Tarifeño, M.: Joint routing and dimensioning of optical burst switching networks. Photon Netw. Commun. 17(3), 266–276 (2009)
Acknowledgments
Authors acknowledge the financial support of Grant Anillo ACT-88, Basal Center CMM-UCh, CIRIC-INRIA Chile (J.B., E.M., G.C.), Programa Iniciativa Cientifica Milenio NC130062 (J.B) and FONDECYT Grants 1120244 (T.H., B.P.) and 1130681 (E.M.).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: MIP formulation for \(\hat{p}^{\text {IS}_0}_a\) estimator under heterogeneous demand
We can formulate an integer linear programming model for this problem
Binary variables \(v_{a,k}\), together with Eqs. (58) and (57), satisfy that \(v_{a,k}=1\) if and only if \(\sum \nolimits _{c=1}^C y_{c,a}=k\). The role of binary variables u is explained in the following lemma
Lemma 2
Let (x, w, u, v) be an optimal solution of previous formulation, then there exist an optimal solution \((x,w,\hat{u},v)\) such that
-
1.
\(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} \le w_a\) if and only if \(\hat{u}_{a,s,k}=0\) for all \(k=1,\ldots ,C\).
-
2.
if \(\hat{u}_{a,s,k}=1\) then \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} =k\),
Hence,
Proof
First, note that constraint (54) impose that if \(u_{a,s,k}=0\) for all k then \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} \le w_a\). Suppose that \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} \le w_a\) but \(u_{a,s,k'}=1\) for some \(k'\). It is easy to see that defining \(\hat{u}_{a,s,k}=0\) for all k and \(\hat{u}=u\) for the other variables, then \(\hat{u}\) also satisfy Eqs. (54) and (55), and since \(\hat{u}\le u\) then it also satisfy Eqs. (59) and (56), hence \((x,w,\hat{u},v)\) is also optimal. Repeating this procedure is easy to see that we obtain a solution that satisfies condition (1). For the second condition, suppose that \(u_{a,s,k}=1\) for some k but \(\sum _{c=1}^C \hat{\xi }^s_c y_{c,a} > k\). Let \(\hat{k}=\sum _{c=1}^C \hat{\xi }^s_c y_{c,a}\) and define \(\hat{u}_{a,s,\hat{k}}=1\), \(\hat{u}_{a,s,k}=0\) \(\forall k\ne \hat{k}\) and \(\hat{u}=u\) for the other variables. By definition, \((w,x,\hat{u},v)\) satisfies (54) and (59), and since \(\hat{k}>k\) then it also satisfies (54). On the other hand, since \(\lambda >0\) then \(e^{-\lambda k}>e^{-\lambda \hat{k}}\) so it also satisfy (56) and then \((x,w,\hat{u},v)\) is also optimal. Repeating this procedure is easy to see that we obtain a solution that satisfies condition (2). \(\square \)
Lemma 2 shows that the optimal solution (y, w) of this MIP formulation satisfies
which is the desired approximation of equation \(\hat{p}^{\text {IS}_0}_a \le \alpha \).
Appendix 2: Proofs of results
1.1 Proof of Lemma 1
Lemma 1
Suppose that the set-function \(I_x\) is such that \(G(x,\cdot )\) is \(I_x\)-determined for each \(x \in X\). Given an i.i.d. sample \((\hat{\xi }^1,\ldots ,\hat{\xi }^N)\) from the distribution of \(\hat{\xi }\), let
Then \(\hat{p}^{\text {IS}_0}(x)\) is also an unbiased estimator of p(x). Moreover,
Proof
First let us prove that the estimator \(\hat{p}^{\text {IS}_0}(x)\) is unbiased, for which it suffices to show that \(\mathbb {E}_{\hat{\xi }}\left[ {\mathbbm {1}}_{\{G(x,\hat{\xi })>0\}}L_x(\hat{\xi })\right] =p(x)\). Indeed, we have
where the second equality follows from the assumption that \(G(x,\cdot )\) is \(I_x\)-determined, which implies that \(G(x,\hat{\xi })\) is measurable with respect to the sigma-algebra generated by \((\hat{\xi }_i)_{i \in I_x}\).
For the second assertion of the theorem, note that
and therefore
\(\square \)
1.2 Proof of Proposition 3
Proposition 3
Let \(\zeta _1,\ldots ,\zeta _m\) be \(m\ge 1\) independent Bernoulli random variables with \(\mathbb {P}\{\zeta _i=1\}=p_i\), and suppose that \(0<p_i<1\) for all i. Let \(Z:= \sum _{i=1}^m \zeta _i\), and define \(\delta := \min _i\, p_i(1-p_i) > 0\). Then, we have
Proof
Let \(u:[0,m]\mapsto \mathbb {R}\) be the function defined as \(u(t):= m^2 - t^2\). Since \(u(\cdot )\) is nonnegative and decreasing on [0, m], we have that
where the last inequality follows from Markov’s inequality. Thus, we have
Next, notice that independence of \(\{\zeta _i\}\) implies that \(\text {Var}(Z)=\sum _{i=1}^m p_i(1-p_i)\). Moreover, since \(0<\mathbb {E}[Z]<m\) we have that \(m + \mathbb {E}[Z] < 2m\), \(m - \mathbb {E}[Z]< m\) and thus from (66) we have that
\(\square \)
1.3 Proof of Theorem 1
Theorem 1
Suppose that \(0<\rho _c<1\) for all \(c=1,\dots , C\). Let \(x=(y,w)\) be such that \(w \in \mathbb {N}\) satisfies \(\sum _{c=1}^C \rho _c y_c < w \le \sum _{c=1}^C y_c-1\). Then the function \(B_{x}(\varvec{\lambda })\) is convex and there exists \(\lambda ^*_{x}\in \mathbb {R}_+\cup \{\infty \}\) such that the vector \(\varvec{\lambda }\) defined as \(\lambda _c=\lambda ^*_{x}\) \(\forall c \in C\) minimizes \(B_{x}(\varvec{\lambda })\). If \(w=\sum _{c=1}^C y_c-1\), then the optimal \(\lambda ^*_{x}\) is \(\lambda ^*_{x}=\infty \) and \(\hat{\rho }_c(\lambda ^*_{x})=1\); otherwise, \(\lambda ^*_{x}\) and \(\hat{\rho }_c(\lambda ^*_{x})\) satisfy
To prove the theorem, we need the following lemma, the proof of which is shown after the proof of the theorem.
Lemma 3
For \(n\ge 1\), let \(\rho _i\), \(i=1,\ldots ,n\) be numbers such that \(\rho _i \in (0,1)\) and \(\rho _1 \ge \rho _2 \ge \ldots \ge \rho _n\). Given an integer w such that \(0 \le w \le n-1\), consider problem (P) defined as follows:
Then, there exists an optimal solution to (P) that satisfies \(\lambda _1 \le \lambda _2 \le \ldots \le \lambda _n\).
Proof
(of Theorem 1) Let \(n=\sum _{c=1}^C y_c\). Without loss of generality, let us assume for the sake of simplifying notation that the set \(\{c: y_c=1\}\) corresponds to \(\{1,\ldots ,n\}\). Since the \(\log \) function is increasing, we have that
By Lemma 3, minimizing \(\log (B_{x}(\varvec{\lambda }))\) amounts to solving the following problem:
Note that the objective function of the above problem is strictly convex in \(\varvec{\lambda }\). In fact, its second derivatives are
Since \(B_x(\varvec{\lambda })=\exp (\log (B_x(\varvec{\lambda }))\) and \(\log (B_x(\varvec{\lambda }))\) is convex—though not strictly convex due to the components \(\lambda _c\) such that \(y_c=0\)—it follows that \(B_x\) is convex in \(\varvec{\lambda }\). Of course, the components \(\lambda _c\) such that \(y_c=0\) do not affect the value of \(B_x(\varvec{\lambda })\).
Suppose first that \(w=n-1\). Then, the first derivative of the objective function in (67) is given by
so we see that \(\lim _{\varvec{\lambda }\rightarrow \infty } \nabla \psi (\varvec{\lambda }) =0\). Notice that we can in particular interpret \(\lim _{\varvec{\lambda }\rightarrow \infty }\) as \(\lim _{{\lambda }\rightarrow \infty }\) with \(\lambda _i=\lambda \). That is, in that case the optimal solution of (67)–(69) is \(\lambda _i=\infty \), \(i=1,\ldots ,n\).
Consider now the case \(w<n-1\). We will show that problem (67)–(69) has a unique optimal solution, which can be found by writing the Karush-Kuhn-Tucker conditions as follows:
where \(\varvec{\mu }=(\mu _i)\) is the vector of Lagrangean multipliers of constraints (68) and \(\mu _0\) is the Lagrangean multiplier of constraint (69).
Consider now a particular choice of vectors \(\varvec{\mu }\) and \(\varvec{\lambda }\) defined as follows. All components of \(\varvec{\lambda }\) are identical, with \(\lambda _i=\lambda ^*\), where \(\lambda ^*\in \mathbb {R}_{+}\) solves the equation
Note that we can always find such \(\lambda ^*\), since the function \(\varphi (\lambda )\) is continuous and increasing, and
The inequalities in (76) follow from the assumptions of the theorem on w and the fact that we are analyzing the case \(w<n-1\). The components of \(\varvec{\mu }\) are defined as
We claim that \(\varvec{\mu }\) and \(\varvec{\lambda }\) satisfy the KKT conditions (70)–(74) laid out above. To see that, observe that Eqs. (78)–(79) imply (70). Equation (71) follows from (75), since we have
and the latter term coincides with \(\mu _{n-1}\) defined in (79). Equations (72) and (73) are trivially satisfied. Finally, we show that (74) holds, with strict inequality if \(i\ge 1\). Indeed, (75) implies that
and clearly have
as each term in the summand is less than 1. \(\square \)
Proof
(of Lemma 3) Suppose that \(\varvec{\lambda }=(\lambda _1, \ldots , \lambda _n)\) is an optimal solution and there exists some \(j<n\) such that \(\lambda _j>\lambda _{j+1}\). We will show that \(\bar{\varvec{\lambda }}\) defined as \(\bar{\lambda }_j=\lambda _{j+1}\), \(\bar{\lambda }_{j+1}=\lambda _j\) and \(\bar{\lambda }_i=\lambda _i\) for \(i\ne \{j,j+1\}\) has no worse objective function than \(\varvec{\lambda }\). Let \(\varDelta \) be defined as the difference in objective function between \(\varvec{\lambda }\) and \(\bar{\varvec{\lambda }}\), i.e.,
We will prove that \(\varDelta \ge 0\), showing that \({\bar{\varvec{\lambda }}}\) is no worse than \(\varvec{\lambda }\). Note initially that
since the maximum value on both sides is equal to the sum of the smallest \(w+1\) components of the vector \(\varvec{\lambda }\). Thus, we only need to compare remaining part of the objective function, i.e., we have
Since \(\bar{\lambda }_{j}=\lambda _{j+1}\) and \(\bar{\lambda }_{j+1}=\lambda _{j}\), it follows that
Note that the argument inside the \(\log \) is positive, since \(\lambda _j>\lambda _{j+1}\). Moreover, since \(\rho _j \ge \rho _{j+1}\), we see that \(1/\rho _{j}-1\le 1/\rho _{j+1}-1\) and hence we conclude that \(\varDelta \ge 0\). \(\quad \square \)
Rights and permissions
About this article
Cite this article
Barrera, J., Homem-de-Mello, T., Moreno, E. et al. Chance-constrained problems and rare events: an importance sampling approach. Math. Program. 157, 153–189 (2016). https://doi.org/10.1007/s10107-015-0942-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0942-x