Tree approximation for discrete time stochastic processes: a process distance approach

Kovacevic, Raimund M.; Pichler, Alois

doi:10.1007/s10479-015-1994-2

Tree approximation for discrete time stochastic processes: a process distance approach

Published: 07 September 2015

Volume 235, pages 395–421, (2015)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Raimund M. Kovacevic^1,2 &
Alois Pichler³

515 Accesses
23 Citations
Explore all metrics

Abstract

Approximating stochastic processes by scenario trees is important in decision analysis. In this paper we focus on improving the approximation quality of trees by smaller, tractable trees. In particular we propose and analyze an iterative algorithm to construct improved approximations: given a stochastic process in discrete time and starting with an arbitrary, approximating tree, the algorithm improves both, the probabilities on the tree and the related path-values of the smaller tree, leading to significantly improved approximations of the initial stochastic process. The quality of the approximation is measured by the process distance (nested distance), which was introduced recently. For the important case of quadratic process distances the algorithm finds locally best approximating trees in finitely many iterations by generalizing multistage k-means clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic generation of scenario trees

Article 09 May 2015

On distributionally robust multiperiod stochastic optimization

Article 04 July 2014

Sample Average Approximation in a Two-Stage Stochastic Linear Program with Quantile Criterion

Article 01 December 2018

Notes

Notice the notational difference: d is the distance function on the original space $\Xi $, while ${\mathsf {d}}_{r}$ denotes the Wasserstein distance.
In the context of transportation and transportation plans, the paths of the stochastic process are called locations.
The selection has to be chosen in a measurable way.
See also Dupačová et al. (2003), Theorem 2.
A ${\mathcal {F}}$-measurable set $a\in {\mathcal {F}}$ is an atom if $b\subsetneq a$ implies that $P\left( b\right) =0$.

References

Bally, V., Pagès, G., & Printems, J. (2005). A quantization tree method for pricing and hedging multidimensional American options. Mathematical Finance, 15(1), 119–168.
Article Google Scholar
Beiglböck, M., Goldstern, M., Maresch, G., & Schachermayer, W. (2009). Optimal and better transport plans. Journal of Functional Analysis, 256(6), 1907–1927.
Article Google Scholar
Beiglböck, M., Léonard, C., & Schachermayer, W. (2012). A general duality theorem for the Monge–Kantorovich transport problem. Studia Mathematica, 209, 151–167.
Article Google Scholar
Drezner, Z., & Hamacher, H. W. (2002). Facility location: Applications and theory. New York, NY: Springer.
Book Google Scholar
Dudley, R. M. (1969). The speed of mean Glivenko–Cantelli convergence. The Annals of Mathematical Statistics, 40(1), 40–50.
Article Google Scholar
Dupačová, J., Gröwe-Kuska, N., & Römisch, W. (2003). Scenario reduction in stochastic programming. Mathematical Programming, Series A, 95(3), 493–511.
Article Google Scholar
Durrett, R. A. (2004). Probability: Theory and examples (2nd ed.). Belmont, CA: Duxbury Press.
Google Scholar
Graf, S., & Luschgy, H. (2000). Foundations of quantization for probability distributions (vol. 1730), Lecture notes in mathematics. Berlin, Heidelberg: Springer.
Heitsch, H., & Römisch, W. (2003). Scenario reduction algorithms in stochastic programming. Computational Optimization and Applications, 24(2–3), 187–206.
Article Google Scholar
Heitsch, H., & Römisch, W. (2007). A note on scenario reduction for two-stage stochastic programs. Operations Research Letters, 6, 731–738.
Article Google Scholar
Heitsch, H., & Römisch, W. (2009a). Scenario tree modeling for multistage stochastic programs. Mathematical Programming Series A, 118, 371–406.
Heitsch, H., & Römisch, W. (2009b). Scenario tree reduction for multistage stochastic programs. Computational Management Science, 2, 117–133.
Heitsch, H., & Römisch, W. (2011). Stability and scenario trees for multistage stochastic programs. In G. Infanger (Ed.), Stochastic programming, volume 150 of international series in operations research & management science, pp. 139–164. New York: Springer.
Heitsch, H., Römisch, W., & Strugarek, C. (2006). Stability of multistage stochastic programs. SIAM Journal on Optimization, 17(2), 511–525.
Article Google Scholar
Høyland, K., & Wallace, S. W. (2001). Generating scenario trees for multistage decision problems. Management Science, 47, 295–307.
Article Google Scholar
King, A. J., & Wallace, S. W. (2013). Modeling with stochastic programming, volume XVI of Springer Series in Operations Research and Financial Engineering. Berlin: Springer.
Lloyd, S. P. (1982). Least square quantization in PCM. IEEE Transactions of Information Theory, 28(2), 129–137.
Article Google Scholar
Nocedal, J. (1980). Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35(151), 773–782.
Article Google Scholar
Pflug, G. C., & Römisch, W. (2007). Modeling, measuring and managing risk. River Edge, NJ: World Scientific.
Book Google Scholar
Pflug, G. C. (2009). Version-independence and nested distribution in multistage stochastic optimization. SIAM Journal on Optimization, 20, 1406–1420.
Article Google Scholar
Pflug, G. C., & Pichler, A. (2012). A distance for multistage stochastic optimization models. SIAM Journal on Optimization, 22(1), 1–23.
Article Google Scholar
Pichler, A. (2013). Evaluations of risk measures for different probability measures. SIAM Journal on Optimization, 23(1), 530–551.
Article Google Scholar
Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. West Sussex: Wiley.
Google Scholar
Rachev, S. T., & Rüschendorf, L. (1998). Mass transportation problems vol. I: Theory, vol. II: Applications, volume XXV of Probability and its applications. New York: Springer.
Römisch, W. (2003). Stability of stochastic programming problems. In A. Ruszczyński & A. Shapiro (Eds.), Stochastic programming, handbooks in operations research and management science, volume 10, chapter 8. Amsterdam: Elsevier.
Google Scholar
Ruszczyński, A. (2006). Nonlinear optimization. Princeton: Princeton University Press.
Google Scholar
Schachermayer, W., & Teichmann, J. (2009). Characterization of optimal transport plans for the Monge–Kantorovich problem. Proceedings of the American Mathematical Society, 137(2), 519–529.
Article Google Scholar
Shapiro, A. (2010). Computational complexity of stochastic programming: Monte Carlo sampling approach. In Proceedings of the international congress of mathematicians, pp. 2979–2995, Hyderabad, India.
Shapiro, A., & Nemirovski, A. (2005). On complexity of stochastic programming problems. In V. Jeyakumar & A. M. Rubinov (Eds.), Continuous optimization: Current trends and applications (pp. 111–144). Berlin: Springer.
Chapter Google Scholar
Shiryaev, A. N. (1996). Probability. New York: Springer.
Book Google Scholar
Vershik, A. M. (2006). Kantorovich metric: Initial history and little-known applications. Journal of Mathematical Sciences, 133(4), 1410–1417.
Article Google Scholar
Villani, C. (2003). Topics in optimal transportation (vol. 58). Graduate Studies in Mathematics Providence, RI: American Mathematical Society.
Villani, C. (2009). Optimal transport, old and new (vol. 338), Grundlehren der Mathematischen Wissenschaften. Berlin: Springer.
Williams, D. (1991). Probability with martingales. Cambridge: Cambridge University Press.
Book Google Scholar

Download references

Acknowledgments

We thank the referees for their constructive criticism. We wish to thank two anonymous referees for their dedication to review the paper. Their valuable comments significantly improved the content and the presentation. Parts of this paper are addressed in the book Multistage Stochastic Optimization (Springer) by Pflug and Pichler, which also summarizes many more topics in multistage stochastic optimization and which had to be completed before final acceptance of this paper.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, University of Vienna, Vienna, Austria
Raimund M. Kovacevic
Institute of Statistics and Mathematical Methods in Economy, Vienna University of Technology, Vienna, Austria
Raimund M. Kovacevic
Norwegian University of Science and Technology, Trondheim, Norway
Alois Pichler

Authors

Raimund M. Kovacevic
View author publications
You can also search for this author in PubMed Google Scholar
Alois Pichler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raimund M. Kovacevic.

Ethics declarations

Funding

This research was partially funded by the Austrian science fund FWF, project P 24125-N13 and by the Research Council of Norway, Grant 207690/E20.

Additional information

Raimund M. Kovacevic: This research was partially funded by the Austrian science fund FWF, project P 24125-N13.

Alois Pichler: The author gratefully acknowledges support of the Research Council of Norway (Grant 207690/E20).

Appendices

Appendix 1: Scenario approximation with Wasserstein distances

Given a probability measure P we ask for an approximating probability measure, which is located on $\Xi ^{\prime }$, that is to say its support is contained in $\Xi ^{\prime }$. The following proposition reveals that the pushforward measure $P^{{\mathbf {T}}}$, where the mapping ${\mathbf {T}}$ is defined in (ii) of the following proposition, is the best approximation of P located just on $\Xi ^{\prime }$, i.e., $P^{{\mathbf {T}}}$ satisfies

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) \le {\mathsf {d}}_{r}\left( P,P^{\prime }\right) \qquad \left( P^{\prime }\left( \Xi ^{\prime }\right) =1\right) . \end{aligned}$$

(27)

Proposition 1

(Lower bounds and best approximation) Let P and $P^{\prime }$ be probability measures.

(i)
The Wasserstein distance has the lower bound
$$\begin{aligned} \int \nolimits _{\Xi }\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) \le {\mathsf {d}}_{r}\left( P,\,P^{\prime }\right) ^{r}. \end{aligned}$$
(28)
(ii)
The lower bound in (28) is attained for the pushforward measure $P^{{\mathbf {T}}}:=P\circ {\mathbf {T}}^{-1}$ on $\Xi ^{\prime }$, where the transport map ${\mathbf {T}}:\Xi \rightarrow \Xi ^{\prime }$ is defined by^{Footnote 3}
$$\begin{aligned} {\mathbf {T}}\left( \xi \right) \in \mathop {\hbox {argmin}}\limits _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) . \end{aligned}$$
It holds that^{Footnote 4}
$$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) ^{r}=\int \min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) ={\mathbb {E}}\left[ d\left( {\text {id}}_{\Xi },{\mathbf {T}}\left( {\text {id}}_{\Xi }\right) \right) ^{r}\right] , \end{aligned}$$
where the identity ${\text {id}}_{\Xi }\left( \xi \right) =\xi $ on $\Xi $ is employed for notational convenience.
(iii)
If $\Xi =\Xi ^{\prime }$ is a vector space and ${\mathbf {T}}$ as in (ii), then
$$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{\tilde{{\mathbf {T}}}}\right) \le {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) , \end{aligned}$$
where $\tilde{{\mathbf {T}}}$ is defined by $\tilde{{\mathbf {T}}}\left( \xi \right) :={\mathbb {E}}_{P}\left[ \tilde{\xi }\left| \,{\mathbf {T}}\left( \tilde{\xi }\right) ={\mathbf {T}}\left( \xi \right) \right. \right] $.

Proof

Let $\pi $ have the marginals of P and $P^{\prime }$. Then

$$\begin{aligned} \int \nolimits _{\Xi \times \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}\pi \left( {\mathrm {d}}\xi ,{\mathrm {d}}\xi ^{\prime }\right)&\ge \int \nolimits _{\Xi }\int \nolimits _{\Xi ^{\prime }}\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}\pi \left( {\mathrm {d}}\xi ,{\mathrm {d}}\xi ^{\prime }\right) \\&=\int \nolimits _{\Xi }\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) . \end{aligned}$$

Taking the infimum over $\pi $ reveals the lower bound (28).

Define the transport plan $\pi :=P\circ \left( {\text {id}}_{\Xi }\times {\mathbf {T}}\right) ^{-1}$ by employing the transport map ${\mathbf {T}}$. Then

$$\begin{aligned} \pi \left( A\times B\right) =P\left( \left\{ \xi :\left( \xi ,{\mathbf {T}}\left( \xi \right) \right) \in A\times B\right\} \right) =P\left( \left\{ \xi :\xi \in A\text { and }{\mathbf {T}}\left( \xi \right) \in B\right\} \right) . \end{aligned}$$

$\pi $ is feasible, it has the marginals $\pi \left( A\times \Xi ^{\prime }\right) =P\left( \left\{ \xi :\xi \in A,\,{\mathbf {T}}\left( \xi \right) \in \Xi ^{\prime }\right\} \right) =P\left( A\right) $ and $\pi \left( \Xi \times B\right) =P\left( \left\{ \xi :{\mathbf {T}}\left( \xi \right) \in B\right\} \right) =P^{{\mathbf {T}}}\left( B\right) $. For this measure $\pi $ thus

$$\begin{aligned} \iint \nolimits _{\Xi \times \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}\pi \left( {\mathrm {d}}\xi ,{\mathrm {d}}\xi ^{\prime }\right) =\int \nolimits _{\Xi }d\left( \xi ,{\mathbf {T}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) =\int \nolimits _{\Xi }\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) , \end{aligned}$$

which proves (ii).

For the last assertion apply the conditional Jensen’s inequality (cf., e.g., Williams 1991) $\varphi \left( {\mathbb {E}}\left( X|{\mathbf {T}}\right) \right) \le {\mathbb {E}}\left( \varphi \left( X\right) |{\mathbf {T}}\right) $ to the convex mapping $\varphi :y\mapsto d\left( \xi ,y\right) ^{r}$ and obtain

$$\begin{aligned} d\left( \xi ,\,{\mathbb {E}}\left( {\text {id}}|{\mathbf {T}}\right) \circ {\mathbf {T}}\right) \le {\mathbb {E}}\left( d\left( \xi ,{\text {id}}\right) |{\mathbf {T}}\right) \circ {\mathbf {T}}. \end{aligned}$$

The measure $\tilde{\pi }\left( A\times B\right) :=P\left( A\cap \tilde{{\mathbf {T}}}^{-1}\left( B\right) \right) $ has marginals P and $P^{\tilde{{\mathbf {T}}}}$, from which follows that

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,P^{\tilde{{\mathbf {T}}}}\right) ^{r}\le&\int d\left( \xi ,\tilde{{\mathbf {T}}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) =\int d\left( \xi ,\,{\mathbb {E}}\left( {\text {id}}|{\mathbf {T}}\right) \circ {\mathbf {T}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) \\ \le&\int {\mathbb {E}}\left( d\left( \xi ,{\text {id}}\right) ^{r}|{\mathbf {T}}\right) \left( {\mathbf {T}}\left( \xi \right) \right) P\left( {\mathrm {d}}\xi \right) =\int d\left( \xi ,{\mathbf {T}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) ={\mathsf {d}}_{r}\left( P,P^{{\mathbf {T}}}\right) ^{r}, \end{aligned}$$

which is the assertion. $\square $

It was addressed in the introduction that the approximation can be improved by relocating the scenarios themselves, and by allocating adapted probabilities to these scenarios. The following two sections address these issues by applying the previous Proposition 1.

1.1 Optimal probabilities

The optimal measure $P^{{\mathbf {T}}}$ in Proposition 1 notably does not depend on the order r. Moreover, given a probability measure P, Proposition 1 (ii) allows to find the best approximation, which is located just on finitely many points $Q=\left\{ q_{1}\dots q_{n}\right\} $. The points $q_{j}\in Q$ are often called quantizers, and we adopt this notion in what follows (see the œuvre of Gilles Pagés, e.g., Bally et al. (2005) for a comprehensive treatment).

Consider now $\Xi ^{\prime }:=Q$, define $p_{j}^{*}:=P\left( {\mathbf {T}}=q_{j}\right) $, then the collection of distinct sets $\left\{ {\mathbf {T}}=q_{j}\right\} $ is a tessellation of $\Xi $ (a Voronoi tessellation, see Graf and Luschgy 2000) and set $P^{Q}:=P^{{\mathbf {T}}}=\sum \nolimits _{j}p_{j}^{*}\cdot \delta _{q_{j}}$, as above. Then ${\mathsf {d}}_{r}\left( P,\,P^{Q}\right) ^{r}=\int \min _{q\in Q}d\left( \xi ,q\right) ^{r}P\left( {\mathrm {d}}\xi \right) $, and no better approximation is possible by Proposition 1.

According to Proposition 1 the best approximating measure for $P=\sum \nolimits _{i}p_{i}\delta _{\xi _{i}}$, which is located on Q, is given by $P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}$. For a discrete measure this can be formulated by a linear program as

$$\begin{aligned} \begin{array}{ll} \begin{array}{l} \text {minimize }\\ \quad \text {(in }\pi ) \end{array} &{} \sum \nolimits _{i,j}d_{i,j}^{r}\pi _{i,j}\\ \text {subject to } &{} \sum \nolimits _{j}\pi _{i,j}=p_{i},\\ &{} \pi _{i,j}\ge 0, \end{array} \end{aligned}$$

which is solved by the optimal transport plan

$$\begin{aligned} \pi _{i,j}^{*}:={\left\{ \begin{array}{ll} p_{i} &{} \text{ if } d\left( \xi _{i},q_{j}\right) =\min _{q\in Q}d\left( \xi _{i},q\right) \\ 0 &{} \text{ else }, \end{array}\right. } \end{aligned}$$

(29)

such that

$$\begin{aligned} p_{j}^{*}=\sum \limits _{i}\pi _{i,j}^{*}\qquad \text{ and } \qquad {\mathsf {d}}_{r}\left( P,P^{Q}\right) ^{r}={\mathbb {E}}_{\pi ^{*}} \left( d^{r}\right) . \end{aligned}$$

(30)

Observe as well that the matrix $\pi ^{*}$ in (29) has just $\left| \Xi \right| $ non-zero entries, as in every row i of $\pi ^{*}$ there is just one non-zero entry $\pi _{i,j}^{*}$. This is a simplification in comparison with Remark 2, as the solution $\pi $ of (4) has $\left| \Xi \right| +\left| \Xi ^{\prime }\right| -1$ non-zero entries, if the probability measure $P^{\prime }$ is specified.

Finally, given the support points Q, it is an easy exercise to look up the closest points according to (29), and sum up their probabilities according (30), such that the solution of (27), the closest measure to P located on Q, is immediately obtained by $P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}$.

1.2 Optimal supporting points—facility location

Given the previous results on optimal probabilities the problem of finding a sufficiently good approximation of P in the Wasserstein is reduced to the problem of finding good locations Q, that is to minimize the function

$$\begin{aligned} \left\{ q_{1},\dots q_{n}\right\} \mapsto {\mathsf {d}}_{r}\left( P,\,P_{\left\{ q_{1},\dots q_{n}\right\} }\right) ^{r}= & {} \int \min _{q\in \left\{ q_{1},\dots q_{n}\right\} }d\left( \xi ,q\right) ^{r}P\left( {\mathrm {d}}\xi \right) \nonumber \\= & {} {\mathbb {E}}_{\xi }\left[ \min _{q\in \left\{ q_{1},\dots q_{n}\right\} }d\left( \xi ,q\right) ^{r}\right] . \end{aligned}$$

(33)

Minimizing (33) with respect to the quantizers $\left\{ q_{1},\dots q_{n}\right\} $ is often referred to as facility location, as in Drezner and Hamacher (2002). This problem is not convex, and no closed form solution exists in general, it hence has to be handled with adequate numerical algorithms. Moreover, it is well known that the facility location problems are is NP-hard.

For the important case of the quadratic Wasserstein distance, Proposition 1 (iii) and its proof give rise for an adaption of the k-means clustering algorithm [also referred to as Lloyd’s algorithm, cf. Lloyd (1982)], which is described in Algorithm 2. In this case the conditional average is the best approximation in terms of the Euclidean norm, such that the algorithm terminates after finitely many iterations at a local minimum.

Theorem 4

The measures $P^{k}$ generated by Algorithm 2 are improved approximations for P, they satisfy

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,P^{k+1}\right) \le {\mathsf {d}}_{r}\left( P,P^{k}\right) , \end{aligned}$$

and the algorithm terminates after finitely many iterations.

In the case of the quadratic Wasserstein distance Algorithm 2 terminates at a local minimum $\left\{ q_{1},\dots q_{n}\right\} $ of (33).

Proof

Algorithm 2 is an iterative refinement technique, which finds the measure

$$\begin{aligned} P^{k}=\sum \limits _{j=1}^{n}P\left( T_{j}^{k}\right) \delta _{q_{j}^{k}} \end{aligned}$$

after k iterations. By construction of (32) it is an improvement due to Proposition 1, (ii) and (iii), and hence

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,P^{k+1}\right) \le {\mathsf {d}}_{r}\left( P,P^{k}\right) . \end{aligned}$$

The algorithm terminates after finitely many iterations because there are just finitely many Voronoi-combinations $T_{j}$.

For the Euclidean distance and $r=2$ the expectation ${\mathbb {E}}\left( \xi \right) =\sum \nolimits _{i}p_{i}\xi _{i}$ minimizes the function

$$\begin{aligned} q\mapsto \sum \limits _{i}p_{i}\cdot \left\| q-\xi _{i}\right\| _{2}^{2}={\mathbb {E}}_{\xi }\left( \left\| q-{\xi }\right\| _{2}^{2}\right) . \end{aligned}$$

In this case $P^{k}$ thus is a local minimum of (33). $\square $

For other distances than the quadratic Wasserstein distance, $P^{k}$ is possibly a good starting point to solve (33), but in general not already a local (global) minimum.

Appendix 2: Stochastic processes and trees

1.1 Any tree induces a filtration

Any tree with height T and finitely many nodes ${\mathcal {N}}$ naturally induces a filtration ${\mathcal {F}}$: First use ${\mathcal {N}}_{T}$ as sample space. For any $n\in {\mathcal {N}}$ define the atom^{Footnote 5} $a\left( n\right) \subset {\mathcal {N}}_{T}$ in a backward recursive way by

$$\begin{aligned} a\left( n\right) :={\left\{ \begin{array}{ll} \left\{ n\right\} &{} \text{ if } n\in {\mathcal {N}}_{T}\\ \bigcup _{j\in n_{+}}a\left( j\right) &{} \text{ else }. \end{array}\right. } \end{aligned}$$

Employing these atoms, the related sigma algebra is defined by

$$\begin{aligned} {\mathcal {F}}_{t}:=\sigma \left( a(n):\,n\in {\mathcal {N}}_{t}\right) . \end{aligned}$$

From the construction of the atoms it is evident that ${\mathcal {F}}_{0}=\left\{ \emptyset ,\,{\mathcal {N}}_{T}\right\} $ for a rooted tree and that ${\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) $ is a filtration on the sample space ${\mathcal {N}}_{T}$, i.e. it holds that ${\mathcal {F}}_{t}\subset {\mathcal {F}}_{t+1}$. Notice that node m is a predecessor of n, i.e. $m\in {\mathcal {A}}(n)$, if and only if

$$\begin{aligned} a\left( m\right) \in {\mathcal {A}}(a\left( n\right) ). \end{aligned}$$

Employing the atoms $a\left( n\right) $ a tree process can be defined by

$$\begin{aligned} \nu :\left\{ 0,\dots T\right\} \times {\mathcal {N}}_{T}&\rightarrow {\mathcal {N}}\\ \left( t,\,i\right)&\mapsto n \text{ if } i\in a\left( n\right) \text{ and } n\in {\mathcal {N}}_{t}\,\left( \text{ i.e. } n\in {\mathcal {A}}(i)\right) , \end{aligned}$$

such that each

$$\begin{aligned} \nu _{t}:{\mathcal {N}}_{T}&\rightarrow {\mathcal {N}}_{t}\\ i&\mapsto \nu \left( t,i\right) \end{aligned}$$

is ${\mathcal {F}}_{t}$-measurable. Moreover, the process $\nu $ is adapted to its natural filtration, i.e.

$$\begin{aligned} {\mathcal {F}}_{t}=\sigma \left( \nu _{0},\dots \nu _{t}\right) = \sigma \left( \nu _{t}\right) . \end{aligned}$$

It is natural to introduce the notation $i_{t}:=\nu _{t}\left( i\right) $ which denotes the state of the tree process for any final outcome $i\in {\mathcal {N}}_{T}$ at stage t. It then holds that $i_{T}=i$, and moreover that $i_{t}\in {\mathcal {A}}(i_{\tau })$ whenever $t\le \tau $, and finally—for a rooted tree—$i_{0}=0$. The sample path from the root node 0 to a final node $i\in {\mathcal {N}}_{T}$ is

$$\begin{aligned} \left( \nu _{t}\left( i\right) \right) _{t=0}^{T}=\left( i_{t}\right) _{t=0}^{T}. \end{aligned}$$

1.2 Any filtration induces a tree

On the other hand, given a filtration ${\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) $ on a finite sample space $\Omega $ it is possible to define a tree, representing the filtration: Just consider the sets $A_{t}$ that collect all atoms that generate ${\mathcal {F}}_{t}$ (${\mathcal {F}}_{t}=\sigma \left( A_{t}\right) $), and define the nodes

$$\begin{aligned} {\mathcal {N}}:=\left\{ \left( a,t\right) :a\in A_{t}\right\} \end{aligned}$$

and the arcs

$$\begin{aligned} A=\left\{ \left( \left( a,t\right) ,\left( b,t+1\right) \right) :a\in A_{t},\,a\in {\mathcal {A}}(b)\in A_{t+1}\right\} . \end{aligned}$$

$\left( {\mathcal {N}},A\right) $ then is a directed tree respecting the filtration ${\mathcal {F}}$.

Hence filtrations on a finite sample space and finite trees are equivalent structures up to possibly different labels, and in the following, we will not distinguish between them.

1.3 Measures on trees

Let P be a probability measure on ${\mathcal {F}}_{T}$, such that $\left( {\mathcal {N}}_{T},{\mathcal {F}}_{T},P\right) $ is a probability space. The notions introduced above allow to extend the probability measure to the entire tree via the definition (cf. Fig. 3)

$$\begin{aligned} P^{\nu }\left( A\right) :=P\left( \bigcup _{t\in \left\{ 0,\dots T\right\} }\nu _{t}^{-1}\left( A\cap {\mathcal {N}}_{t}\right) \right) \qquad \left( A\subset {\mathcal {N}}\right) . \end{aligned}$$

In particular this definition includes the unconditional probabilities

$$\begin{aligned} P\left( \left\{ n\right\} \right) =:P\left( n\right) \end{aligned}$$

for each node. Furthermore it can be used to define conditional probabilities

$$\begin{aligned} P\left( \left. \left\{ n\right\} \right| \left\{ m\right\} \right) =:P\left( \left. n\right| m\right) , \end{aligned}$$

representing the probability of transition from n to m, if $m\in {\mathcal {A}}(n)$.

1.4 Value and decision processes

In a multi-period, discrete time setup the outcomes or realizations of a stochastic process are of interest, not the concrete model (the sample space): in focus is the sample space

$$\begin{aligned} \Xi :=\Xi _{0}\times \dots \Xi _{T} \end{aligned}$$

of the stochastic process

$$\begin{aligned} \xi :\left\{ 0,\dots T\right\} \times {\mathcal {N}}_{T}\rightarrow \Xi . \end{aligned}$$

The process is measurable with respect to each ${\mathcal {F}}_{t}=\sigma \left( \nu _{t}\right) $, from which follows (cf. (Shiryaev 1996, Theorem II.4.3)) that $\xi $ can be decomposed as

$$\begin{aligned} \xi _{t}=\xi _{t}\circ \nu _{t}, \end{aligned}$$

(i.e. ${\text {id}}_{t}\circ \xi =\xi _{t}\circ \nu _{t}$, where ${\text {id}}_{t}:\Xi \rightarrow \Xi _{t}$ is the natural projection). Notice that $\xi _{t}\in \Xi _{t}$ is an observation of the stochastic process at stage t and measurable with respect to ${\mathcal {F}}_{t}$ (in symbols $\xi _{t}\lhd {\mathcal {F}}_{t}$), and at this stage t all prior observations

$$\begin{aligned} \xi _{0:t}:=\left( \xi _{0},\dots \xi _{t}\right) \end{aligned}$$

are ${\mathcal {F}}_{t}$-measurable as well.

In multistage stochastic programming, a decision maker has the possibility to influence the results to be expected at the very end of the process by making a decision $x_{t}$ at any stage t of time, having available the information which occurred up to the time when the decision is made, that is $\xi _{0:t}$. The decision has to be taken prior to the next observation $\xi _{t+1}$ (e.g., a decision about a new portfolio allocation has to be made before knowing next days security prices).

This nonanticipativity property of the decisions is modeled by the assumption that any $x_{t}$ is measurable with respect to ${\mathcal {F}}_{t}$ ($x_{t}\lhd {\mathcal {F}}_{t}$), such that again

$$\begin{aligned} x_{t}=x_{t}\circ \nu _{t}, \end{aligned}$$

i.e. ${\text {id}}_{t}\circ x=x_{t}\circ \nu _{t}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kovacevic, R.M., Pichler, A. Tree approximation for discrete time stochastic processes: a process distance approach. Ann Oper Res 235, 395–421 (2015). https://doi.org/10.1007/s10479-015-1994-2

Download citation

Published: 07 September 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10479-015-1994-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tree approximation for discrete time stochastic processes: a process distance approach

Abstract

Access this article