Elsevier

Operations Research Letters

Volume 49, Issue 5, September 2021, Pages 676-681
Operations Research Letters

Central limit theorem and sample complexity of stationary stochastic programs

https://doi.org/10.1016/j.orl.2021.06.019Get rights and content

Abstract

In this paper we discuss sample complexity of solving stationary stochastic programs by the Sample Average Approximation (SAA) method. We investigate this in the framework of Optimal Control (in discrete time) setting. In particular we derive a Central Limit Theorem type asymptotics for the optimal values of the SAA problems. The main conclusion is that the sample size, required to attain a given relative error of the SAA solution, is not sensitive to the discount factor, even if the discount factor is very close to one. We consider the risk neutral and risk averse settings. The presented numerical experiments confirm the theoretical analysis.

Introduction

Consider the following optimal control (in discrete time) infinite horizon problemminutUEP[t=0γtc(xt,ut,ξt)]s.t.xt+1=F(xt,ut,ξt). Variables xtRn represent state of the system, utRm are controls, ξtRd, t=0,..., is a sequence of independent identically distributed (iid) random vectors (random noise or disturbances) with probability distribution P of ξt supported on set ΞRd, c:X×Rm×ΞR is the cost function, F:X×Rm×ΞX is a measurable mapping, URm and XRn are nonempty closed sets, and γ(0,1) is the discount factor. Value x0 is given (initial conditions). The notation EP emphasises that the expectation is taken with respect of the probability distribution P of ξt. In such setting, problem (1.1) is the classical formulation of stationary optimal control (in discrete time) problem (e.g., [2]).

Problem (1.1) can be also considered in the framework of stochastic programming by viewing yt=(xt,ut) as decision variables (e.g., [6]). In case the problem is convex, it is possible to apply a Stochastic Dual Dynamic Programming (SDDP) cutting plane type algorithm for a numerical solution. For periodical infinite horizon stochastic programming problems such algorithms were recently discussed in [7], problem (1.1) can be viewed as a particular case of the periodical setting with the period of one. In order to solve (1.1) numerically the (generally continuous) distribution of the random process ξt should be discretized. The so-called Sample Average Approximation (SAA) method approaches this by generating a random sample of the (marginal) distribution of ξt by using Monte Carlo sampling techniques.

This raises the question of the involved sample complexity, i.e., how large should be the sample size N in order for the SAA problem to give an accurate approximation of the original problem. In some applications the discount factor γ is very close to one. It is well known that as the discount factor approaches one, it becomes more difficult to solve problem (1.1). For a given γ(0,1), the sample complexity of the discretization is discussed in [7], with the derived upper bound on the sample size N being of order O((1γ)3ε2) as a function of the discount factor γ and the error level ε>0. Since the optimal value of problem (1.1) increases at the rate of O((1γ)1) as γ approaches one, in terms of the relative error (1γ)1ε, this would imply the required sample size is of order O((1γ)1) as a function of γ. This suggests that increasing γ from 0.99 to 0.999 would require to increase the sample size by the factor of 10 in order to achieve more or less the same relative accuracy of the SAA method. However, the above is just an upper bound and some numerical experiments indicate that the relative error of the SAA approach is not much sensitive to increase of the discount factor even when it is very close to one.

In this paper we will investigate the question of sample complexity from a different point of view. We are going to derive a Central Limit Theorem (CLT) type result for the optimal value of the Sample Average Approximation of problem (1.1). We demonstrate that the standard error (standard deviation) of the distribution of the optimal value of the SAA grows more or less at the same rate O((1γ)1) as the respective optimal value. This supports the evidence of numerical experiments that variability of the sample error of the optimal values, measured in terms of the relative error, is not sensitive to increase of the discount factor, even when the discount factor is very close to one. This is somewhat surprising since as is well known, it is becoming more difficult to solve the problem with increase of the discount factor. We investigate both the risk neutral and risk averse settings.

The paper is organized as follows. In the next section we present the basic theoretical analysis for risk neutral and risk averse cases. In particular, we show how the statistical upper bound of the SDDP algorithm can be constructed in the risk averse case. In section 3 we discuss in detail the classical inventory model. Finally in section 4 we present results of numerical experiments. We use the following notation throughout the paper. For a point ξ we denote by δξ the measure of mass one at ξ. For aR, [a]+:=max{0,a}.

Section snippets

General analysis

The (classical) Bellman equation for the value function, associated with problem (1.1), can be written asV(x)=infuUEP[c(x,u,ξ)+γV(F(x,u,ξ))],xX. Consider the following assumptions.

  • (A1)

    The cost function is bounded, i.e., there is a constant κ>0 such that |c(x,u,ξ)|κ for all (x,u,ξ)X×U×Ξ.

  • (A2)

    The function c(,,) and the mapping F(,,) are continuous on the set X×U×Ξ.

Let B(X) be the space of bounded functions g:XR equipped with the sup-norm g=supxX|g(x)|. Then, under the assumption (A1), V()

Inventory model

Consider the stationary inventory model (cf., [10])minut0E[t=0γt(cut+b[Dt(xt+ut)]++h[xt+utDt]+)]s.t.xt+1=xt+utDt, where c,b,hR+ are the ordering cost, backorder penalty cost and holding cost per unit, respectively (with b>c0), xt is the current inventory level, ut is the order quantity, and DtR+ is the demand at time t which is a random iid process. Then the optimal policy is myopic basestock policy π¯(x)=[xx]+, wherex=F1(b(1γ)cb+h), with F(x)=P(Dx) being the cdf of the demand

Numerical illustration

In this section, we present numerical illustrations of the sample complexity and CLT for the control problems for different values of the discount factor γ. Numerical experiments are performed on the stationary inventory problem and the Brazilian Inter-connected Power System problem with the risk neutral and risk averse formulations (we refer to [8] for the description of the setting of the Brazilian problem). Some additional numerical results can be found online:

References (10)

  • A. Shapiro et al.

    Risk neutral and risk averse stochastic dual dynamic programming method

    Eur. J. Oper. Res.

    (2013)
  • P. Artzner et al.

    Coherent measures of risk

    Math. Finance

    (1999)
  • D.P. Bertsekas et al.

    Stochastic Optimal Control: The Discrete Time Case

    (1978)
  • J. Frédéric Bonnans et al.

    Perturbation Analysis of Optimization Problems

    (2000)
  • L. Ding et al.

    A Python Package for Multi-Stage Stochastic Programming

    (2019)
There are more references available in the full text version of this article.
View full text