Central limit theorem and sample complexity of stationary stochastic programs
Introduction
Consider the following optimal control (in discrete time) infinite horizon problem Variables represent state of the system, are controls, , , is a sequence of independent identically distributed (iid) random vectors (random noise or disturbances) with probability distribution P of supported on set , is the cost function, is a measurable mapping, and are nonempty closed sets, and is the discount factor. Value is given (initial conditions). The notation emphasises that the expectation is taken with respect of the probability distribution P of . In such setting, problem (1.1) is the classical formulation of stationary optimal control (in discrete time) problem (e.g., [2]).
Problem (1.1) can be also considered in the framework of stochastic programming by viewing as decision variables (e.g., [6]). In case the problem is convex, it is possible to apply a Stochastic Dual Dynamic Programming (SDDP) cutting plane type algorithm for a numerical solution. For periodical infinite horizon stochastic programming problems such algorithms were recently discussed in [7], problem (1.1) can be viewed as a particular case of the periodical setting with the period of one. In order to solve (1.1) numerically the (generally continuous) distribution of the random process should be discretized. The so-called Sample Average Approximation (SAA) method approaches this by generating a random sample of the (marginal) distribution of by using Monte Carlo sampling techniques.
This raises the question of the involved sample complexity, i.e., how large should be the sample size N in order for the SAA problem to give an accurate approximation of the original problem. In some applications the discount factor γ is very close to one. It is well known that as the discount factor approaches one, it becomes more difficult to solve problem (1.1). For a given , the sample complexity of the discretization is discussed in [7], with the derived upper bound on the sample size N being of order as a function of the discount factor γ and the error level . Since the optimal value of problem (1.1) increases at the rate of as γ approaches one, in terms of the relative error , this would imply the required sample size is of order as a function of γ. This suggests that increasing γ from 0.99 to 0.999 would require to increase the sample size by the factor of 10 in order to achieve more or less the same relative accuracy of the SAA method. However, the above is just an upper bound and some numerical experiments indicate that the relative error of the SAA approach is not much sensitive to increase of the discount factor even when it is very close to one.
In this paper we will investigate the question of sample complexity from a different point of view. We are going to derive a Central Limit Theorem (CLT) type result for the optimal value of the Sample Average Approximation of problem (1.1). We demonstrate that the standard error (standard deviation) of the distribution of the optimal value of the SAA grows more or less at the same rate as the respective optimal value. This supports the evidence of numerical experiments that variability of the sample error of the optimal values, measured in terms of the relative error, is not sensitive to increase of the discount factor, even when the discount factor is very close to one. This is somewhat surprising since as is well known, it is becoming more difficult to solve the problem with increase of the discount factor. We investigate both the risk neutral and risk averse settings.
The paper is organized as follows. In the next section we present the basic theoretical analysis for risk neutral and risk averse cases. In particular, we show how the statistical upper bound of the SDDP algorithm can be constructed in the risk averse case. In section 3 we discuss in detail the classical inventory model. Finally in section 4 we present results of numerical experiments. We use the following notation throughout the paper. For a point ξ we denote by the measure of mass one at ξ. For , .
Section snippets
General analysis
The (classical) Bellman equation for the value function, associated with problem (1.1), can be written as Consider the following assumptions.
- (A1)
The cost function is bounded, i.e., there is a constant such that for all .
- (A2)
The function and the mapping are continuous on the set .
Let be the space of bounded functions equipped with the sup-norm . Then, under the assumption (A1),
Inventory model
Consider the stationary inventory model (cf., [10]) where are the ordering cost, backorder penalty cost and holding cost per unit, respectively (with ), is the current inventory level, is the order quantity, and is the demand at time t which is a random iid process. Then the optimal policy is myopic basestock policy , where with being the cdf of the demand
Numerical illustration
In this section, we present numerical illustrations of the sample complexity and CLT for the control problems for different values of the discount factor γ. Numerical experiments are performed on the stationary inventory problem and the Brazilian Inter-connected Power System problem with the risk neutral and risk averse formulations (we refer to [8] for the description of the setting of the Brazilian problem). Some additional numerical results can be found online:
References (10)
- et al.
Risk neutral and risk averse stochastic dual dynamic programming method
Eur. J. Oper. Res.
(2013) - et al.
Coherent measures of risk
Math. Finance
(1999) - et al.
Stochastic Optimal Control: The Discrete Time Case
(1978) - et al.
Perturbation Analysis of Optimization Problems
(2000) - et al.
A Python Package for Multi-Stage Stochastic Programming
(2019)