Abstract
We consider smooth stochastic programs and develop a discrete-time optimal-control problem for adaptively selecting sample sizes in a class of algorithms based on variable sample average approximations (VSAA). The control problem aims to minimize the expected computational cost to obtain a near-optimal solution of a stochastic program and is solved approximately using dynamic programming. The optimal-control problem depends on unknown parameters such as rate of convergence, computational cost per iteration, and sampling error. Hence, we implement the approach within a receding-horizon framework where parameters are estimated and the optimal-control problem is solved repeatedly during the calculations of a VSAA algorithm. The resulting sample-size selection policy consistently produces near-optimal solutions in short computing times as compared to other plausible policies in several numerical examples.
Similar content being viewed by others
References
Alexander, S., Coleman, T.F., Li, Y.: Minimizing CVaR and VaR for a portfolio of derivatives. J. Bank. Finance 30, 583–605 (2006)
Attouch, H., Wets, R.J.-B.: Epigraphical processes: laws of large numbers for random lsc functions. In: Seminaire d’Analyze Convexe, Montpellier, pp. 13.1–13.29 (1990)
Bastin, F., Cirillo, C., Toint, P.L.: An adaptive Monte Carlo algorithm for computing mixed logit estimators. Comput. Manag. Sci. 3(1), 55–79 (2006)
Bayraksan, G., Morton, D.P.: A sequential sampling procedure for stochastic programming. Oper. Res. 59(4), 898–913 (2011)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn. Athena Scientific, Belmont (2007)
Betts, J.T., Huffman, W.P.: Mesh refinement in direct transcription methods for optimal control. Optim. Control Appl. 19, 1–21 (1998)
Billingsley, P.: Probability and Measure. Wiley, New York (1995)
Deng, G., Ferris, M.C.: Variable-number sample-path optimization. Math. Program., Ser. B 117, 81–109 (2009)
Ermoliev, Y.: Stochastic quasigradient methods. In: Ermoliev, Y., J-B Wets, R.J.-B. (eds.) Numerical Techniques for Stochastic Optimization. Springer, New York (1988)
Gill, P.E., Hammarling, S.J., Murray, W., Saunders, M.A., Wright, M.H.: LSSOL 1.0 User’s guide. Technical Report SOL-86-1, System Optimization Laboratory, Stanford University, Stanford, CA (1986)
Grant, M., Boyd CVX, S.: Matlab software for disciplined convex programming, version 1.21 (2010). http://cvxr.com/cvx
He, L., Polak, E.: Effective diagonalization strategies for the solution of a class of optimal design problems. IEEE Trans. Autom. Control 35(3), 258–267 (1990)
Higle, J.L., Sen, S.: Stochastic Decomposition: a Statistical Method for Large Scale Stochastic Linear Programming. Springer, New York (1996)
Holmstrom, K.: Tomlab optimization (2009). http://tomopt.com
Homem-de-Mello, T.: Variable-sample methods for stochastic optimization. ACM Trans. Model. Comput. Simul. 13(2), 108–133 (2003)
Homem-de-Mello, T., Shapiro, A., Spearman, M.L.: Finding optimal material release times using simulation-based optimization. Manag. Sci. 45(1), 86–102 (1999)
Hu, J., Fu, M.C., Marcus, S.I.: A model reference adaptive search method for global optimization. Oper. Res. 55(3), 549–568 (2007)
Infanger, G.: Planning Under Uncertainty: Solving Large-Scale Stochastic Linear Programs. Thomson Learning, Washington (1994)
Kall, P., Meyer, J.: Stochastic Linear Programming, Models, Theory, and Computation. Springer, Berlin (2005)
Kohn, W., Zabinsky, Z.B., Brayman, V.: Optimization of algorithmic parameters using a meta-control approach. J. Glob. Optim. 34, 293–316 (2006)
Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, 2nd edn. Springer, New York (2003)
Lan, G.: Convex optimization under inexact first-order information. PhD thesis, Georgia Institute of Technology, Atlanta, GA (2009)
Linderoth, J., Shapiro, A., Wright, S.: The empirical behavior of sampling methods for stochastic programming. Ann. Oper. Res. 142, 215–241 (2006)
Mak, W.K., Morton, D.P., Wood, R.K.: Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24, 47–56 (1999)
Molvalioglu, O., Zabinsky, Z.B., Kohn, W.: The interacting-particle algorithm with dynamic heating and cooling. J. Glob. Optim. 43, 329–356 (2009)
Munakata, T., Nakamura, Y.: Temperature control for simulated annealing. Phys. Rev. E 64(4), 46–127 (2001)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Norkin, V.I., Pflug, G.C., Ruszczynski, A.: A branch and bound method for stochastic global optimization. Math. Program. 83, 425–450 (1998)
Oppen, J., Woodruff, D.L.: Parametric models of local search progress. Int. Trans. Oper. Res. 16, 627–640 (2009)
Pasupathy, R.: On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simulation optimization. Oper. Res. 58, 889–901 (2010)
Pee, E.Y., Royset, J.O.: On solving large-scale finite minimax problems using exponential smoothing. J. Optim. Theory Appl. 148(2), 390–421 (2011)
Pironneau, O., Polak, E.: Consistent approximations and approximate functions and gradients in optimal control. SIAM J. Control Optim. 41(2), 487–510 (2002)
Polak, E.: Optimization. Algorithms and Consistent Approximations. Springer, New York (1997)
Polak, E., Royset, J.O.: Efficient sample sizes in stochastic nonlinear programming. J. Comput. Appl. Math. 217, 301–310 (2008)
Polak, E., Royset, J.O., Womersley, R.S.: Algorithms with adaptive smoothing for finite minimax problems. J. Optim. Theory Appl. 119(3), 459–484 (2003)
Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002)
Royset, J.O.: Optimality functions in stochastic programming. Math. Program. 135(1–2), 293–321 (2012)
Royset, J.O., Polak, E.: Implementable algorithm for stochastic programs using sample average approximations. J. Optim. Theory Appl. 122(1), 157–184 (2004)
Royset, J.O., Polak, E.: Extensions of stochastic optimization results from problems with simple to problems with complex failure probability functions. J. Optim. Theory Appl. 133(1), 1–18 (2007)
Royset, J.O., Polak, E.: Sample average approximations in reliability-based structural optimization: theory and applications. In: Papadrakakis, M., Tsompanakis, Y., Lagaros, N.D. (eds.) Structural Design Optimization Considering Uncertainties, pp. 307–334. Taylor & Francis, London (2008)
Royset, J.O., Polak, E., Der Kiureghian, A.: Adaptive approximations and exact penalization for the solution of generalized semi-infinite min-max problems. SIAM J. Optim. 14(1), 1–34 (2003)
Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: a Unified Combinatorial Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Springer, New York (2004)
Sastry, K., Goldberg, D.E.: Let’s get ready to rumble redux: crossover versus mutation head to head on exponentially scaled problems. In: GECCO’07: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1380–1387. ACM, New York (2007)
Schwartz, A., Polak, E.: Consistent approximations for optimal control problems based on Runge-Kutta integration. SIAM J. Control Optim. 34(4), 1235–1269 (1996)
Shapiro, A.: Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30, 169–186 (1991)
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia (2009)
Shapiro, A., Homem-de-Mello, T.: A simulation-based approach to two-stage stochastic programming with recourse. Math. Program. 81, 301–325 (1998)
Shapiro, A., Wardi, Y.: Convergence analysis of stochastic algorithms. Math. Oper. Res. 21(3), 615–628 (1996)
Spall, J.C.: Introduction to Stochastic Search and Optimization. Wiley, New York (2003)
Verweij, B., Ahmed, S., Kleywegt, A.J., Nemhauser, G., Shapiro, A.: Sample average approximation method applied to stochastic routing problems: a computational study. Comput. Optim. Appl. 24(2–3), 289–333 (2003)
Washburn, A.R., Search and Detection, 4th edn. INFORMS, Linthicum (2002)
Xu, H., Zhang, D.: Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Program. 119, 371–401 (2009)
Xu, S.: Smoothing method for minimax problems. Comput. Optim. Appl. 20, 267–279 (2001)
Acknowledgements
This study is supported by AFOSR Young Investigator grant F1ATA08337G003. The author is grateful for valuable discussions with Roberto Szechtman, Naval Postgraduate School. The author also thanks Alexander Shapiro, Georgia Institute of Technology, for assistance with two technical results.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
This appendix includes proofs of results in Sect. 4.
Proof of Proposition 2.
By assumption, for any i=0,1,…,n k −1 and a∈(0,1)
Hence, \(\phi(a) < f_{N_{k}}(x_{n_{k}}^{k})\) and
for any i=0,1,…,n k and a∈(0,1). Consequently, the logarithmic transformation of the data in Step 2 of Subroutine B is permissable when a j ∈(0,1) and regression coefficients loga j+1 and logb j+1, j=0,1,… , are given by the standard linear least-square regression formulae. Specifically,
Since the denominator in (57) simplifies to \((n_{k}^{3} + 3n_{k}^{2} + 2n_{k})/12\), we obtain using the definition of \(\alpha_{i} = 12(i-n_{k}/2)/(n_{k}^{3} + 3n_{k}^{2} + 2n_{k})\) in Proposition 2 that
We find that
and consequently
By definition, \(\alpha_{i} = 12(i-n_{k}/2)/(n_{k}^{2} + 3n_{k}^{2} + 2n_{k}) = -12(n_{k}-i-n_{k}/2)/(n_{k}^{2} + 3n_{k}^{2} + 2n_{k}) = -\alpha_{n_{k}-i}\). Hence,
where we use the fact that \(\alpha_{n_{k}/2} = 0\) when n k is an even number. The expression for g(⋅) then follows by combining the two products. The positivity of g(a j ) follows trivially from (56), as g(a j ) is a product of positive numbers. Since \(f_{N_{k}}(x_{i}^{k})>f_{N_{k}}(x_{i+1}^{k})\) for all i=0,1,…,n k −1, \((f_{N_{k}}(x_{i}^{k})-\phi(a))/(f_{N_{k}}(x_{n_{k}-i}^{k})-\phi(a))<1\) for all \(i=n_{k}^{0}, n_{k}^{0}+1, \ldots, n_{k}\). Moreover, α i >0 for all \(i=n_{k}^{0}, n_{k}^{0}+1, \ldots, n_{k}\). Hence, it follows that g(a)<1. □
Proof of Theorem 4
We first show that the derivative dg(a)/da exists and is positive on (0,1). For any a∈(0,1) and \(i = n_{k}^{0}, n_{k}^{0}+1, \ldots, n_{k}\), let h i :(0,1)→ℝ be defined by
By (56), h i (a)>0 for any a∈(0,1) and \(i = n_{k}^{0}, n_{k}^{0}+1, \ldots, n_{k}\) and consequently
By straightforward derivation we obtain that
where
Since \(f_{N_{k}}(x_{n_{k}}^{k})-f_{N_{k}}(x_{i}^{k})<0\) for all i=0,1,…,n k −1 by assumption, it follows that dϕ(a)/da<0 for all a∈(0,1). Again, by assumption, \(f_{N_{k}}(x_{i}^{k})-f_{N_{k}}(x_{n_{k}-i}^{k})<0\) for all \(i = n_{k}^{0}, n_{k}^{0}+1, \ldots, n_{k}\). Hence, by (56) and the fact that α i >0 for all \(i = n_{k}^{0}, n_{k}^{0}+1, \ldots, n_{k}\), we conclude that dg(a)/da>0 for any a∈(0,1)
Since \(\{a_{j}\}_{j=0}^{\infty}\) is contained in the compact set [0,1] by Proposition 2, it follows that there exists a subsequence {a j } j∈J , with J⊂ℕ, and an a ∗∈[0,1] such that a j →J a ∗, as j→∞. By the mean value theorem, we obtain that for every j=0,1,2,… , there exists an s j ∈[0,1] such that
Since dg(a j +s j (a j+1−a j ))/da>0, it follows that \(\{a_{j}\}_{j=0}^{\infty}\) generated by Subroutine B initialized with a 0∈(0,1) is either strictly increasing or strictly decreasing. That is, if a 0<a 1, then a j <a j+1 for all j∈ℕ. If a 0>a 1, then a j >a j+1 for all j∈ℕ. Hence, a j →a ∗ as j→∞. Similarly, g(a j )∈(0,1) by Proposition 2 and there must exists a convergent subsequence of \(\{g(a_{j})\}_{j=0}^{\infty}\) that convergence to a point g ∗∈[0,1]. Since a j+1=g(a j ), \(\{g(a_{j})\}_{j=0}^{\infty}\) is either strictly increasing or strictly decreasing and therefore g(a j )→g ∗, as j→∞. Since a j+1=g(a j ) for all j∈ℕ and a j →a ∗ and g(a j )→g ∗, as j→∞, we have that a ∗=g ∗. By continuity of g(⋅) on (0,1), if a ∗∈(0,1), then g(a j )→g(a ∗), as j→∞. Hence, g(a ∗)=g ∗=a ∗. If a ∗=0, then g ∗=0. If a ∗=1, then g ∗=1. Since by definition g(0)=0 and g(1)=1, it follows that a ∗=g(a ∗) in these two cases too. The finite termination of Subroutine B follows directly from the fact that \(\{a_{j}\}_{j=0}^{\infty}\) converges. □
Proof of Theorem 5
Since \(f_{N_{k}}(x_{i}^{k})= f_{N_{k}}^{*} + (\theta_{N_{k}})^{i}(f_{N_{k}}(x_{0}^{k}) - f_{N_{k}}^{*})\) for all i=0,1,2,…,
It then follows by (60) that
Since \(\sum_{i=0}^{n_{k}}i^{2} = (n_{k}+1)(n_{k}^{2}/3 + n_{k}/6)\),
Since \(\sum_{i=0}^{n_{k}}\alpha_{i}=0\) by (59), the conclusion follows. □
Proof of Theorem 6
By Theorem 5 and (66), \(\theta_{N_{k}} = g(\theta_{N_{k}})\) and \(\phi(\theta_{N_{k}})=f_{N_{k}}^{*}\). Consequently, it follows from (63) and the assumption of exact linear rate that
Since
we obtain that
If n k is even, then using (59) we find that
If n k is odd, then using (59) we find that
Since 1/(n k +1)<(n k +1)/(n k (n k +2)),
for all n k =2,3,… .
The first multiplicative term in (69) decomposes as follows:
Since \(1/(1-\theta_{N_{k}}^{i'})\leq1/(1-\theta_{N_{k}})\) for any i′∈ℕ, we obtain from (69) using (70) and (71) that
Using the mean of the geometric distribution, we deduce that \(\sum_{i = 1}^{\infty}i\theta_{N_{k}}^{i} = \theta_{N_{k}}/(1-\theta_{N_{k}})^{2}\). Hence,
Consequently, if \(n_{k} > (1+ \sqrt{1+72\beta})/6\), where \(\beta= \theta_{N_{k}}/(1-\theta_{N_{k}})^{3}\), then the right-hand size in (73) is less than one. Hence, for \(n_{k} > (1+ \sqrt{1+72\beta })/6\), \(dg(\theta_{N_{k}})/da<1\). Since \(\theta_{N_{k}}/(1-\theta_{N_{k}})^{3}\leq0.99/(1-0.99)^{3}\) for all \(\theta_{N_{k}} \in[0, 0.99]\), it follows that when n k ≥1408, \(dg(\theta_{N_{k}})/da<1\) for any \(\theta_{N_{k}} \in[0, 0.99]\). It then follows by the fixed point theorem that under the assumption that n k ≥1408, \(a_{j}\to\theta_{N_{k}}\), as j→∞, whenever a 0 is sufficiently close to \(\theta_{N_{k}}\).
It appears difficult to examine \(dg(\theta_{N_{k}})/da\) analytically for 2<n k <1408. However, we show that \(dg(\theta_{N_{k}})/da<1\) for all \(\theta_{N_{k}} \in(0, 0.99]\) and 2≤n k <1408 using the following numerical scheme. (We note that the case with n k =2 is easily checked analytically, but we do not show that for brevity.) We consider the function γ:[0,1)→ℝ defined for any θ∈[0,1) by
Obviously, for \(\theta_{N_{k}}\in(0,1)\), \(\gamma(\theta_{N_{k}})=dg(\theta_{N_{k}})/da\). Straightforward derivation yields that
Hence, for any θ max∈(0,1),
for all θ∈(0,θ max]. Consequently, γ(⋅) is Lipschitz continuous on (0,θ max] with Lipschitz constant L. Hence, it follows that it suffices to check \(dg(\theta_{N_{k}})/da\) for n k ∈{2,3,…,1407} and a finite number of values for \(\theta_{N_{k}}\) to verify that \(dg(\theta_{N_{k}})/da<1\) for all \(\theta_{N_{k}}\in(0,\theta_{\max}]\). Let \(\tilde{\theta}_{1}\), \(\tilde{\theta}_{2}\), …, \(\tilde{\theta}_{\tilde{k}}\) be these values, which are computed recursively starting with \(\tilde{\theta}_{1} = 0\) and then by \(\tilde{\theta}_{k+1} = \tilde{\theta}_{k} + (1-\gamma(\tilde{\theta}_{k}))/L\), k=1,2,…, until a value no smaller than θ max is obtained. Let θ max=0.99. Since we find that \(\gamma(\tilde{\theta}_{k}) < 1\) for all k in this case, it follows from the fact that γ(⋅) is Lipschitz continuous on (0,0.99] with Lipschitz constant L that γ(θ)<1 for all θ∈[0,0.99]. Hence, \(dg(\theta_{N_{k}})/da<1\) for all \(\theta_{N_{k}} \in(0,0.99]\) and n k =2,3,…,1407. The conclusion then follows by the fixed-point theorem. □
Proof of Proposition 4
By Proposition 1, \(N^{1/2}(f_{N}^{*} - f^{*})\Rightarrow \mathcal {N}(0, \sigma^{2}(x^{*}))\), as N→∞. Let \(\{N_{l}(S_{k})\}_{S_{k}=1}^{\infty}\), l=1,2,…,k, be such that N l (S k )∈ℕ for all S k ∈ℕ and l=1,2,…,k, \(\sum_{l=1}^{k} N_{l}(S_{k}) = S_{k}\), and N l (S k )/S k →β l ∈[0,1], as S k →∞. Consequently, \(\sum_{l=1}^{k} \beta_{l} = 1\). By Slutsky’s theorem (see, e.g., Exercise 25.7 of [7]), it then follows that for all l=1,2,…,k,
as S k →∞.
Since the sequences \(\{f_{N_{l}}(x_{i}^{l})\}_{i=0}^{n_{l}}\), l=1,2,…,k, converge exactly linearly with coefficient \(\hat{\theta}_{l+1}\), it follows that the minimization in (43) can be ignored and \(\hat{m}_{l} = f_{N_{l}}^{*}\), l=1,2,…,k. Using the recursive formula for \(\hat{f}_{k+1}^{*}\) in Step 2 of Subroutine C, we find that \(\hat{f}_{k+1}^{*} = \sum_{l=1}^{k} N_{l}(S_{k}) f_{N_{l}}^{*}/S_{k}\). Consequently,
It then follows by the continuous mapping theorem and the independence of samples across stages that
as S k →∞. The conclusion then follows from the fact that \(\sum_{l=1}^{k} \beta_{l} = 1\). □
Rights and permissions
About this article
Cite this article
Royset, J.O. On sample size control in sample average approximations for solving smooth stochastic programs. Comput Optim Appl 55, 265–309 (2013). https://doi.org/10.1007/s10589-012-9528-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-012-9528-1