Abstract
Approximating stochastic processes by scenario trees is important in decision analysis. In this paper we focus on improving the approximation quality of trees by smaller, tractable trees. In particular we propose and analyze an iterative algorithm to construct improved approximations: given a stochastic process in discrete time and starting with an arbitrary, approximating tree, the algorithm improves both, the probabilities on the tree and the related path-values of the smaller tree, leading to significantly improved approximations of the initial stochastic process. The quality of the approximation is measured by the process distance (nested distance), which was introduced recently. For the important case of quadratic process distances the algorithm finds locally best approximating trees in finitely many iterations by generalizing multistage k-means clustering.
Similar content being viewed by others
Notes
Notice the notational difference: d is the distance function on the original space \(\Xi \), while \({\mathsf {d}}_{r}\) denotes the Wasserstein distance.
In the context of transportation and transportation plans, the paths of the stochastic process are called locations.
The selection has to be chosen in a measurable way.
See also Dupačová et al. (2003), Theorem 2.
A \({\mathcal {F}}\)-measurable set \(a\in {\mathcal {F}}\) is an atom if \(b\subsetneq a\) implies that \(P\left( b\right) =0\).
References
Bally, V., Pagès, G., & Printems, J. (2005). A quantization tree method for pricing and hedging multidimensional American options. Mathematical Finance, 15(1), 119–168.
Beiglböck, M., Goldstern, M., Maresch, G., & Schachermayer, W. (2009). Optimal and better transport plans. Journal of Functional Analysis, 256(6), 1907–1927.
Beiglböck, M., Léonard, C., & Schachermayer, W. (2012). A general duality theorem for the Monge–Kantorovich transport problem. Studia Mathematica, 209, 151–167.
Drezner, Z., & Hamacher, H. W. (2002). Facility location: Applications and theory. New York, NY: Springer.
Dudley, R. M. (1969). The speed of mean Glivenko–Cantelli convergence. The Annals of Mathematical Statistics, 40(1), 40–50.
Dupačová, J., Gröwe-Kuska, N., & Römisch, W. (2003). Scenario reduction in stochastic programming. Mathematical Programming, Series A, 95(3), 493–511.
Durrett, R. A. (2004). Probability: Theory and examples (2nd ed.). Belmont, CA: Duxbury Press.
Graf, S., & Luschgy, H. (2000). Foundations of quantization for probability distributions (vol. 1730), Lecture notes in mathematics. Berlin, Heidelberg: Springer.
Heitsch, H., & Römisch, W. (2003). Scenario reduction algorithms in stochastic programming. Computational Optimization and Applications, 24(2–3), 187–206.
Heitsch, H., & Römisch, W. (2007). A note on scenario reduction for two-stage stochastic programs. Operations Research Letters, 6, 731–738.
Heitsch, H., & Römisch, W. (2009a). Scenario tree modeling for multistage stochastic programs. Mathematical Programming Series A, 118, 371–406.
Heitsch, H., & Römisch, W. (2009b). Scenario tree reduction for multistage stochastic programs. Computational Management Science, 2, 117–133.
Heitsch, H., & Römisch, W. (2011). Stability and scenario trees for multistage stochastic programs. In G. Infanger (Ed.), Stochastic programming, volume 150 of international series in operations research & management science, pp. 139–164. New York: Springer.
Heitsch, H., Römisch, W., & Strugarek, C. (2006). Stability of multistage stochastic programs. SIAM Journal on Optimization, 17(2), 511–525.
Høyland, K., & Wallace, S. W. (2001). Generating scenario trees for multistage decision problems. Management Science, 47, 295–307.
King, A. J., & Wallace, S. W. (2013). Modeling with stochastic programming, volume XVI of Springer Series in Operations Research and Financial Engineering. Berlin: Springer.
Lloyd, S. P. (1982). Least square quantization in PCM. IEEE Transactions of Information Theory, 28(2), 129–137.
Nocedal, J. (1980). Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35(151), 773–782.
Pflug, G. C., & Römisch, W. (2007). Modeling, measuring and managing risk. River Edge, NJ: World Scientific.
Pflug, G. C. (2009). Version-independence and nested distribution in multistage stochastic optimization. SIAM Journal on Optimization, 20, 1406–1420.
Pflug, G. C., & Pichler, A. (2012). A distance for multistage stochastic optimization models. SIAM Journal on Optimization, 22(1), 1–23.
Pichler, A. (2013). Evaluations of risk measures for different probability measures. SIAM Journal on Optimization, 23(1), 530–551.
Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. West Sussex: Wiley.
Rachev, S. T., & Rüschendorf, L. (1998). Mass transportation problems vol. I: Theory, vol. II: Applications, volume XXV of Probability and its applications. New York: Springer.
Römisch, W. (2003). Stability of stochastic programming problems. In A. Ruszczyński & A. Shapiro (Eds.), Stochastic programming, handbooks in operations research and management science, volume 10, chapter 8. Amsterdam: Elsevier.
Ruszczyński, A. (2006). Nonlinear optimization. Princeton: Princeton University Press.
Schachermayer, W., & Teichmann, J. (2009). Characterization of optimal transport plans for the Monge–Kantorovich problem. Proceedings of the American Mathematical Society, 137(2), 519–529.
Shapiro, A. (2010). Computational complexity of stochastic programming: Monte Carlo sampling approach. In Proceedings of the international congress of mathematicians, pp. 2979–2995, Hyderabad, India.
Shapiro, A., & Nemirovski, A. (2005). On complexity of stochastic programming problems. In V. Jeyakumar & A. M. Rubinov (Eds.), Continuous optimization: Current trends and applications (pp. 111–144). Berlin: Springer.
Shiryaev, A. N. (1996). Probability. New York: Springer.
Vershik, A. M. (2006). Kantorovich metric: Initial history and little-known applications. Journal of Mathematical Sciences, 133(4), 1410–1417.
Villani, C. (2003). Topics in optimal transportation (vol. 58). Graduate Studies in Mathematics Providence, RI: American Mathematical Society.
Villani, C. (2009). Optimal transport, old and new (vol. 338), Grundlehren der Mathematischen Wissenschaften. Berlin: Springer.
Williams, D. (1991). Probability with martingales. Cambridge: Cambridge University Press.
Acknowledgments
We thank the referees for their constructive criticism. We wish to thank two anonymous referees for their dedication to review the paper. Their valuable comments significantly improved the content and the presentation. Parts of this paper are addressed in the book Multistage Stochastic Optimization (Springer) by Pflug and Pichler, which also summarizes many more topics in multistage stochastic optimization and which had to be completed before final acceptance of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This research was partially funded by the Austrian science fund FWF, project P 24125-N13 and by the Research Council of Norway, Grant 207690/E20.
Additional information
Raimund M. Kovacevic: This research was partially funded by the Austrian science fund FWF, project P 24125-N13.
Alois Pichler: The author gratefully acknowledges support of the Research Council of Norway (Grant 207690/E20).
Appendices
Appendix 1: Scenario approximation with Wasserstein distances
Given a probability measure P we ask for an approximating probability measure, which is located on \(\Xi ^{\prime }\), that is to say its support is contained in \(\Xi ^{\prime }\). The following proposition reveals that the pushforward measure \(P^{{\mathbf {T}}}\), where the mapping \({\mathbf {T}}\) is defined in (ii) of the following proposition, is the best approximation of P located just on \(\Xi ^{\prime }\), i.e., \(P^{{\mathbf {T}}}\) satisfies
Proposition 1
(Lower bounds and best approximation) Let P and \(P^{\prime }\) be probability measures.
-
(i)
The Wasserstein distance has the lower bound
$$\begin{aligned} \int \nolimits _{\Xi }\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) \le {\mathsf {d}}_{r}\left( P,\,P^{\prime }\right) ^{r}. \end{aligned}$$(28) -
(ii)
The lower bound in (28) is attained for the pushforward measure \(P^{{\mathbf {T}}}:=P\circ {\mathbf {T}}^{-1}\) on \(\Xi ^{\prime }\), where the transport map \({\mathbf {T}}:\Xi \rightarrow \Xi ^{\prime }\) is defined byFootnote 3
$$\begin{aligned} {\mathbf {T}}\left( \xi \right) \in \mathop {\hbox {argmin}}\limits _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) . \end{aligned}$$It holds thatFootnote 4
$$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) ^{r}=\int \min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) ={\mathbb {E}}\left[ d\left( {\text {id}}_{\Xi },{\mathbf {T}}\left( {\text {id}}_{\Xi }\right) \right) ^{r}\right] , \end{aligned}$$where the identity \({\text {id}}_{\Xi }\left( \xi \right) =\xi \) on \(\Xi \) is employed for notational convenience.
-
(iii)
If \(\Xi =\Xi ^{\prime }\) is a vector space and \({\mathbf {T}}\) as in (ii), then
$$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{\tilde{{\mathbf {T}}}}\right) \le {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) , \end{aligned}$$where \(\tilde{{\mathbf {T}}}\) is defined by \(\tilde{{\mathbf {T}}}\left( \xi \right) :={\mathbb {E}}_{P}\left[ \tilde{\xi }\left| \,{\mathbf {T}}\left( \tilde{\xi }\right) ={\mathbf {T}}\left( \xi \right) \right. \right] \).
Proof
Let \(\pi \) have the marginals of P and \(P^{\prime }\). Then
Taking the infimum over \(\pi \) reveals the lower bound (28).
Define the transport plan \(\pi :=P\circ \left( {\text {id}}_{\Xi }\times {\mathbf {T}}\right) ^{-1}\) by employing the transport map \({\mathbf {T}}\). Then
\(\pi \) is feasible, it has the marginals \(\pi \left( A\times \Xi ^{\prime }\right) =P\left( \left\{ \xi :\xi \in A,\,{\mathbf {T}}\left( \xi \right) \in \Xi ^{\prime }\right\} \right) =P\left( A\right) \) and \(\pi \left( \Xi \times B\right) =P\left( \left\{ \xi :{\mathbf {T}}\left( \xi \right) \in B\right\} \right) =P^{{\mathbf {T}}}\left( B\right) \). For this measure \(\pi \) thus
which proves (ii).
For the last assertion apply the conditional Jensen’s inequality (cf., e.g., Williams 1991) \(\varphi \left( {\mathbb {E}}\left( X|{\mathbf {T}}\right) \right) \le {\mathbb {E}}\left( \varphi \left( X\right) |{\mathbf {T}}\right) \) to the convex mapping \(\varphi :y\mapsto d\left( \xi ,y\right) ^{r}\) and obtain
The measure \(\tilde{\pi }\left( A\times B\right) :=P\left( A\cap \tilde{{\mathbf {T}}}^{-1}\left( B\right) \right) \) has marginals P and \(P^{\tilde{{\mathbf {T}}}}\), from which follows that
which is the assertion. \(\square \)
It was addressed in the introduction that the approximation can be improved by relocating the scenarios themselves, and by allocating adapted probabilities to these scenarios. The following two sections address these issues by applying the previous Proposition 1.
1.1 Optimal probabilities
The optimal measure \(P^{{\mathbf {T}}}\) in Proposition 1 notably does not depend on the order r. Moreover, given a probability measure P, Proposition 1 (ii) allows to find the best approximation, which is located just on finitely many points \(Q=\left\{ q_{1}\dots q_{n}\right\} \). The points \(q_{j}\in Q\) are often called quantizers, and we adopt this notion in what follows (see the œuvre of Gilles Pagés, e.g., Bally et al. (2005) for a comprehensive treatment).
Consider now \(\Xi ^{\prime }:=Q\), define \(p_{j}^{*}:=P\left( {\mathbf {T}}=q_{j}\right) \), then the collection of distinct sets \(\left\{ {\mathbf {T}}=q_{j}\right\} \) is a tessellation of \(\Xi \) (a Voronoi tessellation, see Graf and Luschgy 2000) and set \(P^{Q}:=P^{{\mathbf {T}}}=\sum \nolimits _{j}p_{j}^{*}\cdot \delta _{q_{j}}\), as above. Then \({\mathsf {d}}_{r}\left( P,\,P^{Q}\right) ^{r}=\int \min _{q\in Q}d\left( \xi ,q\right) ^{r}P\left( {\mathrm {d}}\xi \right) \), and no better approximation is possible by Proposition 1.
According to Proposition 1 the best approximating measure for \(P=\sum \nolimits _{i}p_{i}\delta _{\xi _{i}}\), which is located on Q, is given by \(P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}\). For a discrete measure this can be formulated by a linear program as
which is solved by the optimal transport plan
such that
Observe as well that the matrix \(\pi ^{*}\) in (29) has just \(\left| \Xi \right| \) non-zero entries, as in every row i of \(\pi ^{*}\) there is just one non-zero entry \(\pi _{i,j}^{*}\). This is a simplification in comparison with Remark 2, as the solution \(\pi \) of (4) has \(\left| \Xi \right| +\left| \Xi ^{\prime }\right| -1\) non-zero entries, if the probability measure \(P^{\prime }\) is specified.
Finally, given the support points Q, it is an easy exercise to look up the closest points according to (29), and sum up their probabilities according (30), such that the solution of (27), the closest measure to P located on Q, is immediately obtained by \(P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}\).
1.2 Optimal supporting points—facility location
Given the previous results on optimal probabilities the problem of finding a sufficiently good approximation of P in the Wasserstein is reduced to the problem of finding good locations Q, that is to minimize the function
Minimizing (33) with respect to the quantizers \(\left\{ q_{1},\dots q_{n}\right\} \) is often referred to as facility location, as in Drezner and Hamacher (2002). This problem is not convex, and no closed form solution exists in general, it hence has to be handled with adequate numerical algorithms. Moreover, it is well known that the facility location problems are is NP-hard.
For the important case of the quadratic Wasserstein distance, Proposition 1 (iii) and its proof give rise for an adaption of the k-means clustering algorithm [also referred to as Lloyd’s algorithm, cf. Lloyd (1982)], which is described in Algorithm 2. In this case the conditional average is the best approximation in terms of the Euclidean norm, such that the algorithm terminates after finitely many iterations at a local minimum.
Theorem 4
The measures \(P^{k}\) generated by Algorithm 2 are improved approximations for P, they satisfy
and the algorithm terminates after finitely many iterations.
In the case of the quadratic Wasserstein distance Algorithm 2 terminates at a local minimum \(\left\{ q_{1},\dots q_{n}\right\} \) of (33).
Proof
Algorithm 2 is an iterative refinement technique, which finds the measure
after k iterations. By construction of (32) it is an improvement due to Proposition 1, (ii) and (iii), and hence
The algorithm terminates after finitely many iterations because there are just finitely many Voronoi-combinations \(T_{j}\).
For the Euclidean distance and \(r=2\) the expectation \({\mathbb {E}}\left( \xi \right) =\sum \nolimits _{i}p_{i}\xi _{i}\) minimizes the function
In this case \(P^{k}\) thus is a local minimum of (33). \(\square \)
For other distances than the quadratic Wasserstein distance, \(P^{k}\) is possibly a good starting point to solve (33), but in general not already a local (global) minimum.
Appendix 2: Stochastic processes and trees
1.1 Any tree induces a filtration
Any tree with height T and finitely many nodes \({\mathcal {N}}\) naturally induces a filtration \({\mathcal {F}}\): First use \({\mathcal {N}}_{T}\) as sample space. For any \(n\in {\mathcal {N}}\) define the atomFootnote 5 \(a\left( n\right) \subset {\mathcal {N}}_{T}\) in a backward recursive way by
Employing these atoms, the related sigma algebra is defined by
From the construction of the atoms it is evident that \({\mathcal {F}}_{0}=\left\{ \emptyset ,\,{\mathcal {N}}_{T}\right\} \) for a rooted tree and that \({\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) \) is a filtration on the sample space \({\mathcal {N}}_{T}\), i.e. it holds that \({\mathcal {F}}_{t}\subset {\mathcal {F}}_{t+1}\). Notice that node m is a predecessor of n, i.e. \(m\in {\mathcal {A}}(n)\), if and only if
Employing the atoms \(a\left( n\right) \) a tree process can be defined by
such that each
is \({\mathcal {F}}_{t}\)-measurable. Moreover, the process \(\nu \) is adapted to its natural filtration, i.e.
It is natural to introduce the notation \(i_{t}:=\nu _{t}\left( i\right) \) which denotes the state of the tree process for any final outcome \(i\in {\mathcal {N}}_{T}\) at stage t. It then holds that \(i_{T}=i\), and moreover that \(i_{t}\in {\mathcal {A}}(i_{\tau })\) whenever \(t\le \tau \), and finally—for a rooted tree—\(i_{0}=0\). The sample path from the root node 0 to a final node \(i\in {\mathcal {N}}_{T}\) is
1.2 Any filtration induces a tree
On the other hand, given a filtration \({\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) \) on a finite sample space \(\Omega \) it is possible to define a tree, representing the filtration: Just consider the sets \(A_{t}\) that collect all atoms that generate \({\mathcal {F}}_{t}\) (\({\mathcal {F}}_{t}=\sigma \left( A_{t}\right) \)), and define the nodes
and the arcs
\(\left( {\mathcal {N}},A\right) \) then is a directed tree respecting the filtration \({\mathcal {F}}\).
Hence filtrations on a finite sample space and finite trees are equivalent structures up to possibly different labels, and in the following, we will not distinguish between them.
1.3 Measures on trees
Let P be a probability measure on \({\mathcal {F}}_{T}\), such that \(\left( {\mathcal {N}}_{T},{\mathcal {F}}_{T},P\right) \) is a probability space. The notions introduced above allow to extend the probability measure to the entire tree via the definition (cf. Fig. 3)
In particular this definition includes the unconditional probabilities
for each node. Furthermore it can be used to define conditional probabilities
representing the probability of transition from n to m, if \(m\in {\mathcal {A}}(n)\).
1.4 Value and decision processes
In a multi-period, discrete time setup the outcomes or realizations of a stochastic process are of interest, not the concrete model (the sample space): in focus is the sample space
of the stochastic process
The process is measurable with respect to each \({\mathcal {F}}_{t}=\sigma \left( \nu _{t}\right) \), from which follows (cf. (Shiryaev 1996, Theorem II.4.3)) that \(\xi \) can be decomposed as
(i.e. \({\text {id}}_{t}\circ \xi =\xi _{t}\circ \nu _{t}\), where \({\text {id}}_{t}:\Xi \rightarrow \Xi _{t}\) is the natural projection). Notice that \(\xi _{t}\in \Xi _{t}\) is an observation of the stochastic process at stage t and measurable with respect to \({\mathcal {F}}_{t}\) (in symbols \(\xi _{t}\lhd {\mathcal {F}}_{t}\)), and at this stage t all prior observations
are \({\mathcal {F}}_{t}\)-measurable as well.
In multistage stochastic programming, a decision maker has the possibility to influence the results to be expected at the very end of the process by making a decision \(x_{t}\) at any stage t of time, having available the information which occurred up to the time when the decision is made, that is \(\xi _{0:t}\). The decision has to be taken prior to the next observation \(\xi _{t+1}\) (e.g., a decision about a new portfolio allocation has to be made before knowing next days security prices).
This nonanticipativity property of the decisions is modeled by the assumption that any \(x_{t}\) is measurable with respect to \({\mathcal {F}}_{t}\) (\(x_{t}\lhd {\mathcal {F}}_{t}\)), such that again
i.e. \({\text {id}}_{t}\circ x=x_{t}\circ \nu _{t}\).
Rights and permissions
About this article
Cite this article
Kovacevic, R.M., Pichler, A. Tree approximation for discrete time stochastic processes: a process distance approach. Ann Oper Res 235, 395–421 (2015). https://doi.org/10.1007/s10479-015-1994-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-015-1994-2
Keywords
- Stochastic processes and trees
- Wasserstein and Kantorovich distance
- Tree approximation
- Optimal transport
- Facility location