Skip to main content
Log in

Tree approximation for discrete time stochastic processes: a process distance approach

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Approximating stochastic processes by scenario trees is important in decision analysis. In this paper we focus on improving the approximation quality of trees by smaller, tractable trees. In particular we propose and analyze an iterative algorithm to construct improved approximations: given a stochastic process in discrete time and starting with an arbitrary, approximating tree, the algorithm improves both, the probabilities on the tree and the related path-values of the smaller tree, leading to significantly improved approximations of the initial stochastic process. The quality of the approximation is measured by the process distance (nested distance), which was introduced recently. For the important case of quadratic process distances the algorithm finds locally best approximating trees in finitely many iterations by generalizing multistage k-means clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Notice the notational difference: d is the distance function on the original space \(\Xi \), while \({\mathsf {d}}_{r}\) denotes the Wasserstein distance.

  2. In the context of transportation and transportation plans, the paths of the stochastic process are called locations.

  3. The selection has to be chosen in a measurable way.

  4. See also Dupačová et al. (2003), Theorem 2.

  5. A \({\mathcal {F}}\)-measurable set \(a\in {\mathcal {F}}\) is an atom if \(b\subsetneq a\) implies that \(P\left( b\right) =0\).

References

  • Bally, V., Pagès, G., & Printems, J. (2005). A quantization tree method for pricing and hedging multidimensional American options. Mathematical Finance, 15(1), 119–168.

    Article  Google Scholar 

  • Beiglböck, M., Goldstern, M., Maresch, G., & Schachermayer, W. (2009). Optimal and better transport plans. Journal of Functional Analysis, 256(6), 1907–1927.

    Article  Google Scholar 

  • Beiglböck, M., Léonard, C., & Schachermayer, W. (2012). A general duality theorem for the Monge–Kantorovich transport problem. Studia Mathematica, 209, 151–167.

    Article  Google Scholar 

  • Drezner, Z., & Hamacher, H. W. (2002). Facility location: Applications and theory. New York, NY: Springer.

    Book  Google Scholar 

  • Dudley, R. M. (1969). The speed of mean Glivenko–Cantelli convergence. The Annals of Mathematical Statistics, 40(1), 40–50.

    Article  Google Scholar 

  • Dupačová, J., Gröwe-Kuska, N., & Römisch, W. (2003). Scenario reduction in stochastic programming. Mathematical Programming, Series A, 95(3), 493–511.

    Article  Google Scholar 

  • Durrett, R. A. (2004). Probability: Theory and examples (2nd ed.). Belmont, CA: Duxbury Press.

    Google Scholar 

  • Graf, S., & Luschgy, H. (2000). Foundations of quantization for probability distributions (vol. 1730), Lecture notes in mathematics. Berlin, Heidelberg: Springer.

  • Heitsch, H., & Römisch, W. (2003). Scenario reduction algorithms in stochastic programming. Computational Optimization and Applications, 24(2–3), 187–206.

    Article  Google Scholar 

  • Heitsch, H., & Römisch, W. (2007). A note on scenario reduction for two-stage stochastic programs. Operations Research Letters, 6, 731–738.

    Article  Google Scholar 

  • Heitsch, H., & Römisch, W. (2009a). Scenario tree modeling for multistage stochastic programs. Mathematical Programming Series A, 118, 371–406.

  • Heitsch, H., & Römisch, W. (2009b). Scenario tree reduction for multistage stochastic programs. Computational Management Science, 2, 117–133.

  • Heitsch, H., & Römisch, W. (2011). Stability and scenario trees for multistage stochastic programs. In G. Infanger (Ed.), Stochastic programming, volume 150 of international series in operations research & management science, pp. 139–164. New York: Springer.

  • Heitsch, H., Römisch, W., & Strugarek, C. (2006). Stability of multistage stochastic programs. SIAM Journal on Optimization, 17(2), 511–525.

    Article  Google Scholar 

  • Høyland, K., & Wallace, S. W. (2001). Generating scenario trees for multistage decision problems. Management Science, 47, 295–307.

    Article  Google Scholar 

  • King, A. J., & Wallace, S. W. (2013). Modeling with stochastic programming, volume XVI of Springer Series in Operations Research and Financial Engineering. Berlin: Springer.

  • Lloyd, S. P. (1982). Least square quantization in PCM. IEEE Transactions of Information Theory, 28(2), 129–137.

    Article  Google Scholar 

  • Nocedal, J. (1980). Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35(151), 773–782.

    Article  Google Scholar 

  • Pflug, G. C., & Römisch, W. (2007). Modeling, measuring and managing risk. River Edge, NJ: World Scientific.

    Book  Google Scholar 

  • Pflug, G. C. (2009). Version-independence and nested distribution in multistage stochastic optimization. SIAM Journal on Optimization, 20, 1406–1420.

    Article  Google Scholar 

  • Pflug, G. C., & Pichler, A. (2012). A distance for multistage stochastic optimization models. SIAM Journal on Optimization, 22(1), 1–23.

    Article  Google Scholar 

  • Pichler, A. (2013). Evaluations of risk measures for different probability measures. SIAM Journal on Optimization, 23(1), 530–551.

    Article  Google Scholar 

  • Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. West Sussex: Wiley.

    Google Scholar 

  • Rachev, S. T., & Rüschendorf, L. (1998). Mass transportation problems vol. I: Theory, vol. II: Applications, volume XXV of Probability and its applications. New York: Springer.

  • Römisch, W. (2003). Stability of stochastic programming problems. In A. Ruszczyński & A. Shapiro (Eds.), Stochastic programming, handbooks in operations research and management science, volume 10, chapter 8. Amsterdam: Elsevier.

    Google Scholar 

  • Ruszczyński, A. (2006). Nonlinear optimization. Princeton: Princeton University Press.

    Google Scholar 

  • Schachermayer, W., & Teichmann, J. (2009). Characterization of optimal transport plans for the Monge–Kantorovich problem. Proceedings of the American Mathematical Society, 137(2), 519–529.

    Article  Google Scholar 

  • Shapiro, A. (2010). Computational complexity of stochastic programming: Monte Carlo sampling approach. In Proceedings of the international congress of mathematicians, pp. 2979–2995, Hyderabad, India.

  • Shapiro, A., & Nemirovski, A. (2005). On complexity of stochastic programming problems. In V. Jeyakumar & A. M. Rubinov (Eds.), Continuous optimization: Current trends and applications (pp. 111–144). Berlin: Springer.

    Chapter  Google Scholar 

  • Shiryaev, A. N. (1996). Probability. New York: Springer.

    Book  Google Scholar 

  • Vershik, A. M. (2006). Kantorovich metric: Initial history and little-known applications. Journal of Mathematical Sciences, 133(4), 1410–1417.

    Article  Google Scholar 

  • Villani, C. (2003). Topics in optimal transportation (vol. 58). Graduate Studies in Mathematics Providence, RI: American Mathematical Society.

  • Villani, C. (2009). Optimal transport, old and new (vol. 338), Grundlehren der Mathematischen Wissenschaften. Berlin: Springer.

  • Williams, D. (1991). Probability with martingales. Cambridge: Cambridge University Press.

    Book  Google Scholar 

Download references

Acknowledgments

We thank the referees for their constructive criticism. We wish to thank two anonymous referees for their dedication to review the paper. Their valuable comments significantly improved the content and the presentation. Parts of this paper are addressed in the book Multistage Stochastic Optimization (Springer) by Pflug and Pichler, which also summarizes many more topics in multistage stochastic optimization and which had to be completed before final acceptance of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raimund M. Kovacevic.

Ethics declarations

Funding

This research was partially funded by the Austrian science fund FWF, project P 24125-N13 and by the Research Council of Norway, Grant 207690/E20.

Additional information

Raimund M. Kovacevic: This research was partially funded by the Austrian science fund FWF, project P 24125-N13.

Alois Pichler: The author gratefully acknowledges support of the Research Council of Norway (Grant 207690/E20).

Appendices

Appendix 1: Scenario approximation with Wasserstein distances

Given a probability measure P we ask for an approximating probability measure, which is located on \(\Xi ^{\prime }\), that is to say its support is contained in \(\Xi ^{\prime }\). The following proposition reveals that the pushforward measure \(P^{{\mathbf {T}}}\), where the mapping \({\mathbf {T}}\) is defined in (ii) of the following proposition, is the best approximation of P located just on \(\Xi ^{\prime }\), i.e., \(P^{{\mathbf {T}}}\) satisfies

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) \le {\mathsf {d}}_{r}\left( P,P^{\prime }\right) \qquad \left( P^{\prime }\left( \Xi ^{\prime }\right) =1\right) . \end{aligned}$$
(27)

Proposition 1

(Lower bounds and best approximation) Let P and \(P^{\prime }\) be probability measures.

  1. (i)

    The Wasserstein distance has the lower bound

    $$\begin{aligned} \int \nolimits _{\Xi }\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) \le {\mathsf {d}}_{r}\left( P,\,P^{\prime }\right) ^{r}. \end{aligned}$$
    (28)
  2. (ii)

    The lower bound in (28) is attained for the pushforward measure \(P^{{\mathbf {T}}}:=P\circ {\mathbf {T}}^{-1}\) on \(\Xi ^{\prime }\), where the transport map \({\mathbf {T}}:\Xi \rightarrow \Xi ^{\prime }\) is defined byFootnote 3

    $$\begin{aligned} {\mathbf {T}}\left( \xi \right) \in \mathop {\hbox {argmin}}\limits _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) . \end{aligned}$$

    It holds thatFootnote 4

    $$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) ^{r}=\int \min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) ={\mathbb {E}}\left[ d\left( {\text {id}}_{\Xi },{\mathbf {T}}\left( {\text {id}}_{\Xi }\right) \right) ^{r}\right] , \end{aligned}$$

    where the identity \({\text {id}}_{\Xi }\left( \xi \right) =\xi \) on \(\Xi \) is employed for notational convenience.

  3. (iii)

    If \(\Xi =\Xi ^{\prime }\) is a vector space and \({\mathbf {T}}\) as in (ii), then

    $$\begin{aligned} {\mathsf {d}}_{r}\left( P,\,P^{\tilde{{\mathbf {T}}}}\right) \le {\mathsf {d}}_{r}\left( P,\,P^{{\mathbf {T}}}\right) , \end{aligned}$$

    where \(\tilde{{\mathbf {T}}}\) is defined by \(\tilde{{\mathbf {T}}}\left( \xi \right) :={\mathbb {E}}_{P}\left[ \tilde{\xi }\left| \,{\mathbf {T}}\left( \tilde{\xi }\right) ={\mathbf {T}}\left( \xi \right) \right. \right] \).

Proof

Let \(\pi \) have the marginals of P and \(P^{\prime }\). Then

$$\begin{aligned} \int \nolimits _{\Xi \times \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}\pi \left( {\mathrm {d}}\xi ,{\mathrm {d}}\xi ^{\prime }\right)&\ge \int \nolimits _{\Xi }\int \nolimits _{\Xi ^{\prime }}\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}\pi \left( {\mathrm {d}}\xi ,{\mathrm {d}}\xi ^{\prime }\right) \\&=\int \nolimits _{\Xi }\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) . \end{aligned}$$

Taking the infimum over \(\pi \) reveals the lower bound (28).

Define the transport plan \(\pi :=P\circ \left( {\text {id}}_{\Xi }\times {\mathbf {T}}\right) ^{-1}\) by employing the transport map \({\mathbf {T}}\). Then

$$\begin{aligned} \pi \left( A\times B\right) =P\left( \left\{ \xi :\left( \xi ,{\mathbf {T}}\left( \xi \right) \right) \in A\times B\right\} \right) =P\left( \left\{ \xi :\xi \in A\text { and }{\mathbf {T}}\left( \xi \right) \in B\right\} \right) . \end{aligned}$$

\(\pi \) is feasible, it has the marginals \(\pi \left( A\times \Xi ^{\prime }\right) =P\left( \left\{ \xi :\xi \in A,\,{\mathbf {T}}\left( \xi \right) \in \Xi ^{\prime }\right\} \right) =P\left( A\right) \) and \(\pi \left( \Xi \times B\right) =P\left( \left\{ \xi :{\mathbf {T}}\left( \xi \right) \in B\right\} \right) =P^{{\mathbf {T}}}\left( B\right) \). For this measure \(\pi \) thus

$$\begin{aligned} \iint \nolimits _{\Xi \times \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}\pi \left( {\mathrm {d}}\xi ,{\mathrm {d}}\xi ^{\prime }\right) =\int \nolimits _{\Xi }d\left( \xi ,{\mathbf {T}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) =\int \nolimits _{\Xi }\min _{\xi ^{\prime }\in \Xi ^{\prime }}d\left( \xi ,\xi ^{\prime }\right) ^{r}P\left( {\mathrm {d}}\xi \right) , \end{aligned}$$

which proves (ii).

For the last assertion apply the conditional Jensen’s inequality (cf., e.g., Williams 1991) \(\varphi \left( {\mathbb {E}}\left( X|{\mathbf {T}}\right) \right) \le {\mathbb {E}}\left( \varphi \left( X\right) |{\mathbf {T}}\right) \) to the convex mapping \(\varphi :y\mapsto d\left( \xi ,y\right) ^{r}\) and obtain

$$\begin{aligned} d\left( \xi ,\,{\mathbb {E}}\left( {\text {id}}|{\mathbf {T}}\right) \circ {\mathbf {T}}\right) \le {\mathbb {E}}\left( d\left( \xi ,{\text {id}}\right) |{\mathbf {T}}\right) \circ {\mathbf {T}}. \end{aligned}$$

The measure \(\tilde{\pi }\left( A\times B\right) :=P\left( A\cap \tilde{{\mathbf {T}}}^{-1}\left( B\right) \right) \) has marginals P and \(P^{\tilde{{\mathbf {T}}}}\), from which follows that

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,P^{\tilde{{\mathbf {T}}}}\right) ^{r}\le&\int d\left( \xi ,\tilde{{\mathbf {T}}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) =\int d\left( \xi ,\,{\mathbb {E}}\left( {\text {id}}|{\mathbf {T}}\right) \circ {\mathbf {T}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) \\ \le&\int {\mathbb {E}}\left( d\left( \xi ,{\text {id}}\right) ^{r}|{\mathbf {T}}\right) \left( {\mathbf {T}}\left( \xi \right) \right) P\left( {\mathrm {d}}\xi \right) =\int d\left( \xi ,{\mathbf {T}}\left( \xi \right) \right) ^{r}P\left( {\mathrm {d}}\xi \right) ={\mathsf {d}}_{r}\left( P,P^{{\mathbf {T}}}\right) ^{r}, \end{aligned}$$

which is the assertion. \(\square \)

It was addressed in the introduction that the approximation can be improved by relocating the scenarios themselves, and by allocating adapted probabilities to these scenarios. The following two sections address these issues by applying the previous Proposition 1.

1.1 Optimal probabilities

The optimal measure \(P^{{\mathbf {T}}}\) in Proposition 1 notably does not depend on the order r. Moreover, given a probability measure P, Proposition 1 (ii) allows to find the best approximation, which is located just on finitely many points \(Q=\left\{ q_{1}\dots q_{n}\right\} \). The points \(q_{j}\in Q\) are often called quantizers, and we adopt this notion in what follows (see the œuvre of Gilles Pagés, e.g., Bally et al. (2005) for a comprehensive treatment).

Consider now \(\Xi ^{\prime }:=Q\), define \(p_{j}^{*}:=P\left( {\mathbf {T}}=q_{j}\right) \), then the collection of distinct sets \(\left\{ {\mathbf {T}}=q_{j}\right\} \) is a tessellation of \(\Xi \) (a Voronoi tessellation, see Graf and Luschgy 2000) and set \(P^{Q}:=P^{{\mathbf {T}}}=\sum \nolimits _{j}p_{j}^{*}\cdot \delta _{q_{j}}\), as above. Then \({\mathsf {d}}_{r}\left( P,\,P^{Q}\right) ^{r}=\int \min _{q\in Q}d\left( \xi ,q\right) ^{r}P\left( {\mathrm {d}}\xi \right) \), and no better approximation is possible by Proposition 1.

According to Proposition 1 the best approximating measure for \(P=\sum \nolimits _{i}p_{i}\delta _{\xi _{i}}\), which is located on Q, is given by \(P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}\). For a discrete measure this can be formulated by a linear program as

$$\begin{aligned} \begin{array}{ll} \begin{array}{l} \text {minimize }\\ \quad \text {(in }\pi ) \end{array} &{} \sum \nolimits _{i,j}d_{i,j}^{r}\pi _{i,j}\\ \text {subject to } &{} \sum \nolimits _{j}\pi _{i,j}=p_{i},\\ &{} \pi _{i,j}\ge 0, \end{array} \end{aligned}$$

which is solved by the optimal transport plan

$$\begin{aligned} \pi _{i,j}^{*}:={\left\{ \begin{array}{ll} p_{i} &{} \text{ if } d\left( \xi _{i},q_{j}\right) =\min _{q\in Q}d\left( \xi _{i},q\right) \\ 0 &{} \text{ else }, \end{array}\right. } \end{aligned}$$
(29)

such that

$$\begin{aligned} p_{j}^{*}=\sum \limits _{i}\pi _{i,j}^{*}\qquad \text{ and } \qquad {\mathsf {d}}_{r}\left( P,P^{Q}\right) ^{r}={\mathbb {E}}_{\pi ^{*}} \left( d^{r}\right) . \end{aligned}$$
(30)

Observe as well that the matrix \(\pi ^{*}\) in (29) has just \(\left| \Xi \right| \) non-zero entries, as in every row i of \(\pi ^{*}\) there is just one non-zero entry \(\pi _{i,j}^{*}\). This is a simplification in comparison with Remark 2, as the solution \(\pi \) of (4) has \(\left| \Xi \right| +\left| \Xi ^{\prime }\right| -1\) non-zero entries, if the probability measure \(P^{\prime }\) is specified.

Finally, given the support points Q, it is an easy exercise to look up the closest points according to (29), and sum up their probabilities according (30), such that the solution of (27), the closest measure to P located on Q, is immediately obtained by \(P^{Q}=\sum \nolimits _{j}p_{j}^{*}\delta _{q_{j}}\).

figure b

1.2 Optimal supporting points—facility location

Given the previous results on optimal probabilities the problem of finding a sufficiently good approximation of P in the Wasserstein is reduced to the problem of finding good locations Q, that is to minimize the function

$$\begin{aligned} \left\{ q_{1},\dots q_{n}\right\} \mapsto {\mathsf {d}}_{r}\left( P,\,P_{\left\{ q_{1},\dots q_{n}\right\} }\right) ^{r}= & {} \int \min _{q\in \left\{ q_{1},\dots q_{n}\right\} }d\left( \xi ,q\right) ^{r}P\left( {\mathrm {d}}\xi \right) \nonumber \\= & {} {\mathbb {E}}_{\xi }\left[ \min _{q\in \left\{ q_{1},\dots q_{n}\right\} }d\left( \xi ,q\right) ^{r}\right] . \end{aligned}$$
(33)

Minimizing (33) with respect to the quantizers \(\left\{ q_{1},\dots q_{n}\right\} \) is often referred to as facility location, as in Drezner and Hamacher (2002). This problem is not convex, and no closed form solution exists in general, it hence has to be handled with adequate numerical algorithms. Moreover, it is well known that the facility location problems are is NP-hard.

For the important case of the quadratic Wasserstein distance, Proposition 1 (iii) and its proof give rise for an adaption of the k-means clustering algorithm [also referred to as Lloyd’s algorithm, cf. Lloyd (1982)], which is described in Algorithm 2. In this case the conditional average is the best approximation in terms of the Euclidean norm, such that the algorithm terminates after finitely many iterations at a local minimum.

Theorem 4

The measures \(P^{k}\) generated by Algorithm 2 are improved approximations for P, they satisfy

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,P^{k+1}\right) \le {\mathsf {d}}_{r}\left( P,P^{k}\right) , \end{aligned}$$

and the algorithm terminates after finitely many iterations.

In the case of the quadratic Wasserstein distance Algorithm 2 terminates at a local minimum \(\left\{ q_{1},\dots q_{n}\right\} \) of (33).

Proof

Algorithm 2 is an iterative refinement technique, which finds the measure

$$\begin{aligned} P^{k}=\sum \limits _{j=1}^{n}P\left( T_{j}^{k}\right) \delta _{q_{j}^{k}} \end{aligned}$$

after k iterations. By construction of (32) it is an improvement due to Proposition 1, (ii) and (iii), and hence

$$\begin{aligned} {\mathsf {d}}_{r}\left( P,P^{k+1}\right) \le {\mathsf {d}}_{r}\left( P,P^{k}\right) . \end{aligned}$$

The algorithm terminates after finitely many iterations because there are just finitely many Voronoi-combinations \(T_{j}\).

For the Euclidean distance and \(r=2\) the expectation \({\mathbb {E}}\left( \xi \right) =\sum \nolimits _{i}p_{i}\xi _{i}\) minimizes the function

$$\begin{aligned} q\mapsto \sum \limits _{i}p_{i}\cdot \left\| q-\xi _{i}\right\| _{2}^{2}={\mathbb {E}}_{\xi }\left( \left\| q-{\xi }\right\| _{2}^{2}\right) . \end{aligned}$$

In this case \(P^{k}\) thus is a local minimum of (33). \(\square \)

For other distances than the quadratic Wasserstein distance, \(P^{k}\) is possibly a good starting point to solve (33), but in general not already a local (global) minimum.

Appendix 2: Stochastic processes and trees

1.1 Any tree induces a filtration

Any tree with height T and finitely many nodes \({\mathcal {N}}\) naturally induces a filtration \({\mathcal {F}}\): First use \({\mathcal {N}}_{T}\) as sample space. For any \(n\in {\mathcal {N}}\) define the atomFootnote 5 \(a\left( n\right) \subset {\mathcal {N}}_{T}\) in a backward recursive way by

$$\begin{aligned} a\left( n\right) :={\left\{ \begin{array}{ll} \left\{ n\right\} &{} \text{ if } n\in {\mathcal {N}}_{T}\\ \bigcup _{j\in n_{+}}a\left( j\right) &{} \text{ else }. \end{array}\right. } \end{aligned}$$

Employing these atoms, the related sigma algebra is defined by

$$\begin{aligned} {\mathcal {F}}_{t}:=\sigma \left( a(n):\,n\in {\mathcal {N}}_{t}\right) . \end{aligned}$$

From the construction of the atoms it is evident that \({\mathcal {F}}_{0}=\left\{ \emptyset ,\,{\mathcal {N}}_{T}\right\} \) for a rooted tree and that \({\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) \) is a filtration on the sample space \({\mathcal {N}}_{T}\), i.e. it holds that \({\mathcal {F}}_{t}\subset {\mathcal {F}}_{t+1}\). Notice that node m is a predecessor of n, i.e. \(m\in {\mathcal {A}}(n)\), if and only if

$$\begin{aligned} a\left( m\right) \in {\mathcal {A}}(a\left( n\right) ). \end{aligned}$$

Employing the atoms \(a\left( n\right) \) a tree process can be defined by

$$\begin{aligned} \nu :\left\{ 0,\dots T\right\} \times {\mathcal {N}}_{T}&\rightarrow {\mathcal {N}}\\ \left( t,\,i\right)&\mapsto n \text{ if } i\in a\left( n\right) \text{ and } n\in {\mathcal {N}}_{t}\,\left( \text{ i.e. } n\in {\mathcal {A}}(i)\right) , \end{aligned}$$

such that each

$$\begin{aligned} \nu _{t}:{\mathcal {N}}_{T}&\rightarrow {\mathcal {N}}_{t}\\ i&\mapsto \nu \left( t,i\right) \end{aligned}$$

is \({\mathcal {F}}_{t}\)-measurable. Moreover, the process \(\nu \) is adapted to its natural filtration, i.e.

$$\begin{aligned} {\mathcal {F}}_{t}=\sigma \left( \nu _{0},\dots \nu _{t}\right) = \sigma \left( \nu _{t}\right) . \end{aligned}$$

It is natural to introduce the notation \(i_{t}:=\nu _{t}\left( i\right) \) which denotes the state of the tree process for any final outcome \(i\in {\mathcal {N}}_{T}\) at stage t. It then holds that \(i_{T}=i\), and moreover that \(i_{t}\in {\mathcal {A}}(i_{\tau })\) whenever \(t\le \tau \), and finally—for a rooted tree—\(i_{0}=0\). The sample path from the root node 0 to a final node \(i\in {\mathcal {N}}_{T}\) is

$$\begin{aligned} \left( \nu _{t}\left( i\right) \right) _{t=0}^{T}=\left( i_{t}\right) _{t=0}^{T}. \end{aligned}$$

1.2 Any filtration induces a tree

On the other hand, given a filtration \({\mathcal {F}}=\left( {\mathcal {F}}_{0},\ldots {\mathcal {F}}_{T}\right) \) on a finite sample space \(\Omega \) it is possible to define a tree, representing the filtration: Just consider the sets \(A_{t}\) that collect all atoms that generate \({\mathcal {F}}_{t}\) (\({\mathcal {F}}_{t}=\sigma \left( A_{t}\right) \)), and define the nodes

$$\begin{aligned} {\mathcal {N}}:=\left\{ \left( a,t\right) :a\in A_{t}\right\} \end{aligned}$$

and the arcs

$$\begin{aligned} A=\left\{ \left( \left( a,t\right) ,\left( b,t+1\right) \right) :a\in A_{t},\,a\in {\mathcal {A}}(b)\in A_{t+1}\right\} . \end{aligned}$$

\(\left( {\mathcal {N}},A\right) \) then is a directed tree respecting the filtration \({\mathcal {F}}\).

Hence filtrations on a finite sample space and finite trees are equivalent structures up to possibly different labels, and in the following, we will not distinguish between them.

1.3 Measures on trees

Let P be a probability measure on \({\mathcal {F}}_{T}\), such that \(\left( {\mathcal {N}}_{T},{\mathcal {F}}_{T},P\right) \) is a probability space. The notions introduced above allow to extend the probability measure to the entire tree via the definition (cf. Fig. 3)

$$\begin{aligned} P^{\nu }\left( A\right) :=P\left( \bigcup _{t\in \left\{ 0,\dots T\right\} }\nu _{t}^{-1}\left( A\cap {\mathcal {N}}_{t}\right) \right) \qquad \left( A\subset {\mathcal {N}}\right) . \end{aligned}$$

In particular this definition includes the unconditional probabilities

$$\begin{aligned} P\left( \left\{ n\right\} \right) =:P\left( n\right) \end{aligned}$$

for each node. Furthermore it can be used to define conditional probabilities

$$\begin{aligned} P\left( \left. \left\{ n\right\} \right| \left\{ m\right\} \right) =:P\left( \left. n\right| m\right) , \end{aligned}$$

representing the probability of transition from n to m, if \(m\in {\mathcal {A}}(n)\).

1.4 Value and decision processes

In a multi-period, discrete time setup the outcomes or realizations of a stochastic process are of interest, not the concrete model (the sample space): in focus is the sample space

$$\begin{aligned} \Xi :=\Xi _{0}\times \dots \Xi _{T} \end{aligned}$$

of the stochastic process

$$\begin{aligned} \xi :\left\{ 0,\dots T\right\} \times {\mathcal {N}}_{T}\rightarrow \Xi . \end{aligned}$$

The process is measurable with respect to each \({\mathcal {F}}_{t}=\sigma \left( \nu _{t}\right) \), from which follows (cf. (Shiryaev 1996, Theorem II.4.3)) that \(\xi \) can be decomposed as

$$\begin{aligned} \xi _{t}=\xi _{t}\circ \nu _{t}, \end{aligned}$$

(i.e. \({\text {id}}_{t}\circ \xi =\xi _{t}\circ \nu _{t}\), where \({\text {id}}_{t}:\Xi \rightarrow \Xi _{t}\) is the natural projection). Notice that \(\xi _{t}\in \Xi _{t}\) is an observation of the stochastic process at stage t and measurable with respect to \({\mathcal {F}}_{t}\) (in symbols \(\xi _{t}\lhd {\mathcal {F}}_{t}\)), and at this stage t all prior observations

$$\begin{aligned} \xi _{0:t}:=\left( \xi _{0},\dots \xi _{t}\right) \end{aligned}$$

are \({\mathcal {F}}_{t}\)-measurable as well.

In multistage stochastic programming, a decision maker has the possibility to influence the results to be expected at the very end of the process by making a decision \(x_{t}\) at any stage t of time, having available the information which occurred up to the time when the decision is made, that is \(\xi _{0:t}\). The decision has to be taken prior to the next observation \(\xi _{t+1}\) (e.g., a decision about a new portfolio allocation has to be made before knowing next days security prices).

This nonanticipativity property of the decisions is modeled by the assumption that any \(x_{t}\) is measurable with respect to \({\mathcal {F}}_{t}\) (\(x_{t}\lhd {\mathcal {F}}_{t}\)), such that again

$$\begin{aligned} x_{t}=x_{t}\circ \nu _{t}, \end{aligned}$$

i.e. \({\text {id}}_{t}\circ x=x_{t}\circ \nu _{t}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kovacevic, R.M., Pichler, A. Tree approximation for discrete time stochastic processes: a process distance approach. Ann Oper Res 235, 395–421 (2015). https://doi.org/10.1007/s10479-015-1994-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-015-1994-2

Keywords

Mathematics Subject Classification

Navigation