A deep learning method for pricing high-dimensional American-style options via state-space partition

Han, Yuecai; Zheng, Xudong

doi:10.1007/s40314-024-02660-3

A deep learning method for pricing high-dimensional American-style options via state-space partition

Published: 02 April 2024

Volume 43, article number 152, (2024)
Cite this article

Computational and Applied Mathematics Aims and scope Submit manuscript

46 Accesses
Explore all metrics

Abstract

This paper proposes a deep learning approach for solving optimal stopping problems and high-dimensional American-style options pricing problems. Through state-space partition, the method does not require recalculation of the structure of networks when the price of the asset changes, which makes tracking valuation more efficient. This paper also offers theoretical proof for the existence of a deep learning network that can determine the optimal stopping time via state-space partition. We present convergence proofs for the estimators and also test the method on Bermuda max-call options as examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lookback option pricing under the double Heston model using a deep learning algorithm

Article 05 November 2022

A Deep Learning Based Numerical PDE Method for Option Pricing

Article 02 June 2022

Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models

Article Open access 31 August 2021

Data Availability

Data available on request from the authors. The data that support this study are available from the corresponding author upon reasonable request.

References

Aliprantis CD, Border KC (2006) Infinite dimensional analysis. Springer, Berlin. https://doi.org/10.1007/3-540-29587-9
Article Google Scholar
Bally V, Pagès G, Printems J (2003) First-order schemes in the numerical quantization method. Math Financ 13(1):1–16. https://doi.org/10.1111/1467-9965.t01-1-00002
Article MathSciNet Google Scholar
Barraquand J, Martineau D (1995) Numerical valuation of high-dimensional multivariate American securities. J Financ Quant Anal 30(3):383–405. https://doi.org/10.2307/2331347
Article Google Scholar
Becker S, Cheridito P, Jentzen A (2019) Deep optimal stopping. J Mach Learn Res 20:2712–2736
MathSciNet Google Scholar
Belomestny D (2011) On the rates of convergence of simulation-based optimization algorithms for optimal stopping problems. Ann Appl Probab 21(1):215–239. https://doi.org/10.1214/10-AAP692
Article MathSciNet Google Scholar
Belomestny D, Schoenmakers J, Dickmann F (2013) Multilevel dual approach for pricing American style derivatives. Financ Stoch 17(4):717–742. https://doi.org/10.1007/s00780-013-0208-5
Article MathSciNet Google Scholar
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311. https://doi.org/10.1137/16M1080173
Article MathSciNet Google Scholar
Boyle PP (1977) Options: a monte Carlo approach. J Financ Econ 4(3):323–338. https://doi.org/10.1016/0304-405X(77)90005-8
Article Google Scholar
Broadie M, Cao M (2008) Improved lower and upper bound algorithms for pricing American options by simulation. Quant Financ 8(8):845–861. https://doi.org/10.1080/14697680701763086
Article MathSciNet Google Scholar
Caccetta L, Qu B, Zhou G (2011) A globally and quadratically convergent method for absolute value equations. Comput Optim Appl 48(1):45–58. https://doi.org/10.1007/s10589-009-9242-9
Article MathSciNet Google Scholar
Chen N, Glasserman P (2007) Additive and multiplicative duals for American option pricing. Financ Stoch 11(2):153–179. https://doi.org/10.1007/s00780-006-0031-3
Article MathSciNet Google Scholar
Chernogorova TP, Koleva MN, Valkov RL (2018) A two-grid penalty method for American options. Comput Appl Math 37(3):2381–2398. https://doi.org/10.1007/s40314-017-0457-6
Article MathSciNet Google Scholar
Cox JC, Ross SA, Rubinstein M (1979) Option pricing: a simplified approach. J Financ Econ 7(3):229–263. https://doi.org/10.1016/0304-405X(79)90015-1
Article MathSciNet Google Scholar
Dudley RM (1974) Metric entropy of some classes of sets with differentiable boundaries. J Approx Theory 10(3):227–236. https://doi.org/10.1016/0021-9045(74)90120-8
Article MathSciNet Google Scholar
Geske R, Roll R (1984) On valuing American call options with the black-Scholes European formula. J Financ 39(2):443–455. https://doi.org/10.1111/j.1540-6261.1984.tb02319.x
Article Google Scholar
Haugh MB, Kogan L (2004) Pricing American options: a duality approach. Oper Res 52(2):258–270. https://doi.org/10.1287/opre.1030.0070
Article MathSciNet Google Scholar
Imaizumi M, Fukumizu K (2019) Deep neural networks learn non-smooth functions effectively. In: Proceedings of Machine Learning Research, 22nd International Conference on Artificial Intelligence and Statistics, Naha. 89:869–878
Jain S, Oosterlee CW (2012) Pricing high-dimensional Bermudan options using the stochastic grid method. Int J Comput Math 89(9):1186–1211. https://doi.org/10.1080/00207160.2012.690035
Article MathSciNet Google Scholar
Jin X, Tan HH, Sun J (2007) A state-space partitioning method for pricing high-dimensional American-style options. Math Financ 17(3):399–426. https://doi.org/10.1111/j.1467-9965.2007.00309.x
Article MathSciNet Google Scholar
Jin X, Li X, Tan HH, Wu Z (2013) A computationally efficient state-space partitioning approach to pricing high-dimensional American options via dimension reduction. Eur J Oper Res 231(2):362–370. https://doi.org/10.1016/j.ejor.2013.05.035
Article MathSciNet Google Scholar
Kohler M, Langer S (2021) On the rate of convergence of fully connected deep neural network regression estimates. Ann Statist 49(4):2231–2249. https://doi.org/10.1214/20-AOS2034
Article MathSciNet Google Scholar
Kohler M, Krzyzak A, Todorovic N (2010) Pricing of high-dimensional American options by neural networks. Math Financ 20(3):383–410. https://doi.org/10.1111/j.1467-9965.2010.00404.x
Article MathSciNet Google Scholar
Leshno M, Lin VY, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6(6):861–867. https://doi.org/10.1016/S0893-6080(05)80131-5
Article Google Scholar
Li X, Wu Z (2006) A semi-analytic method for valuing high-dimensional options on the maximum and minimum of multiple assets. Ann Financ 2(2):179–205. https://doi.org/10.1007/s10436-005-0034-7
Article Google Scholar
Longstaff FA, Schwartz ES (2001) Valuing American options by simulation: a simple least-squares approach. Rev Financ Stud 14(1):113–147. https://doi.org/10.1093/rfs/14.1.113
Article Google Scholar
Merton RC, Brennan MJ, Schwartz ES (1977) The valuation of American put options. J Financ 32(2):449–462. https://doi.org/10.1111/j.1540-6261.1977.tb03284.x
Article Google Scholar
Rogers LCG (2002) Monte Carlo valuation of American options. Math Financ 12(3):271–286. https://doi.org/10.1111/1467-9965.02010
Article MathSciNet Google Scholar
Rogers LCG (2010) Dual valuation and hedging of Bermudan options. SIAM J Financ Math 1(1):604–608. https://doi.org/10.1137/090772198
Article MathSciNet Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Article Google Scholar
Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065. https://doi.org/10.1109/ACCESS.2019.2912200
Article Google Scholar
Sirignano J, Spiliopoulos K (2018) Dgm: a deep learning algorithm for solving partial differential equations. J Comput Phys 375:1339–1364. https://doi.org/10.1016/j.jcp.2018.08.029
Article MathSciNet Google Scholar
Tilley JA (1993) Valuing American options in a path simulation model. Trans Soc 45:83–104
Google Scholar
Tsitsiklis JN, Van Roy B (2001) Regression methods for pricing complex American-style options. IEEE Trans Neural Netw 12(4):694–703. https://doi.org/10.1109/72.935083
Article Google Scholar

Download references

Acknowledgements

We thank Professor Xun Li for his valuable comments on the idea of the paper. We thank the associate editor and the reviewers for their helpful feedback that improved this paper. This work is partially supported by the National Key R &D Program of China (grant no. 2023YFA1009200) and the National Natural Science Foundation of China (grant no. 11871244).

Funding

National Key R &D Program of China (2023YFA1009200), National Natural Science Foundation of China (11871244).

Author information

Authors and Affiliations

School of Mathematics, Jilin University, Qianjin Street, Changchun, 130012, Jilin, China
Yuecai Han & Xudong Zheng

Authors

Yuecai Han
View author publications
You can also search for this author in PubMed Google Scholar
Xudong Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuecai Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Structure of deep neural networks and proof of Lemmas

Since the stopping decision functions $f_i^j, j=1, \ldots , a_i, i=0,1, \ldots , N-1,$ can only take discrete values, based on the notion of piecewise smooth functions proposed by Imaizumi and Fukumizu (2019) and boundary fragment classes developed by Dudley (1974), we introduce a special class of functions that take 1 on some regions of the state space whose boundaries are formed by a series of smooth functions. For that purpose, we introduce the following definition of $\left( p,C\right) $-smoothness and edge function.

Definition 1

Let $p=q+s$ for some $q \in {\mathbb {N}}_0$ and $0<s \le 1$. A function $m: {\mathbb {R}}^d \rightarrow {\mathbb {R}}$ is called $\left( p, C\right) $-smooth, if for every $\alpha =\left( \alpha _1, \ldots , \alpha _d\right) \in {\mathbb {N}}_0^d$ with $\sum _{j=1}^d \alpha _j=q$, the partial derivative $\partial ^q m /\left( \partial x_1^{\alpha _1} \ldots \partial x_d^{\alpha _d}\right) $ exists and satisfies

$$\begin{aligned} \left| \frac{\partial ^q m}{\partial x_1^{\alpha _1} \ldots \partial x_d^{\alpha _d}}\left( {\textbf{x}}\right) -\frac{\partial ^q m}{\partial x_1^{\alpha _1} \ldots \partial x_d^{\alpha _d}}\left( {\textbf{z}}\right) \right| \le C\Vert {\textbf{x}}-{\textbf{z}}\Vert ^s, \end{aligned}$$

for all ${\textbf{x}}, {\textbf{z}} \in {\mathbb {R}}^d$, where $\Vert \cdot \Vert $ denotes the Euclidean norm.

Definition 2

Let $D \in \{1, \ldots , d\}$, $x \in {\mathbb {R}}^d$. A function is called D-edge function, if it is a element of the set

$$\begin{aligned} \begin{aligned} {\mathcal {H}} \left( p, D\right) :=\,&\{I_{\{f(x)>0\}}: f(x)=g\left( x_{i_2}, \ldots x_{i_D}\right) -x_{i_1},\\&g \text{ is } \left. \left( p, C\right) \text {-smooth, and}\left\{ i_1, \ldots , i_D\right\} \subset \{1, \ldots , d\}\right\} . \end{aligned} \end{aligned}$$

For $\lambda , J, R \in {\mathbb {N}}$ and ${\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}$, we also define

$$\begin{aligned}{} & {} {\mathcal {G}}\left( R, {\mathcal {P}}\right) :=\left\{ f\left( x\right) =\prod _{i=1}^R f_i\left( x\right) : f_i\left( x\right) \in {\mathcal {H}} \left( p_i, D_i\right) \text{ and } \left( p_i, D_i\right) \in {\mathcal {P}}\right\} ,\\{} & {} {\mathcal {K}}\left( \lambda , J, {\mathcal {P}}\right) :=\left\{ f\left( x\right) =\max _{1 \le j \le J} f_j(x): f_j(x) \in {\mathcal {G}}\left( R_j, {\mathcal {P}}_j\right) , \lambda =\sum _{j=1}^J R_j, {\mathcal {P}}_j \subseteq {\mathcal {P}}\right\} . \end{aligned}$$

Obviously, the functions in the ${\mathcal {K}}\left( \lambda , J, {\mathcal {P}}\right) $ take 1 for some blocks and 0 for the rest of the regions and the functions themselves are not differentiable or even discontinuous at the edges of these regions. Thus, $f_{i}^{j}\left( x\right) \in {\mathcal {K}}\left( \lambda _i, J_i, {\mathcal {P}}\right) $, where $\lambda _i, \ J_i \in {\mathbb {N}}$ and ${\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}$. It is worth noting that each edge function in the definition of $f_i^j(x)$ can have a different smoothness $p_{n, i}=$ $q_{n, i}+s_{n, i}$ and a different input dimension $D_{n, i}$, where $0 \le n \le N, \ 1 \le i \le \lambda _n, \ \left( p_{n, i}, D_{n, i}\right) \in {\mathcal {P}}$.

Lemma 8

Let $\lambda , J \in {\mathbb {N}}$, ${\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}$. For arbitrary $f\left( x \right) \in {\mathcal {K}}\left( \lambda , J, {\mathcal {P}}\right) $, there exists a set of $\left( p, C\right) $-smooth functions $f_i\left( x \right) , i=1,\ldots ,$ $\sum _{j=1}^J R_j$, such that

$$\begin{aligned}{} & {} \phi _{1, i}=f_i\left( x \right) ,\ \phi _{2, i}={\textbf{1}}_{\left( 0, \infty \right) }\left( \phi _{1, i}\right) , \ \phi _{3, j}=\prod _{i=\sum _{m=1}^{j-1} R_m+1}^{\sum _{m=1}^j R_m} \phi _{2, i},\\{} & {} \phi _4=\sum _{j=1}^J \phi _{3, j}, \ f\left( x \right) ={\textbf{1}}_{(0, \infty )}\left( \phi _4 \right) . \end{aligned}$$

Proof

Using the definition of ${\mathcal {K}}\left( \lambda , J, {\mathcal {P}}\right) $, the proof of Lemma 8 can be obtained directly. $\square $

Lemma 9

For $x \in \left[ 0,1\right] ^d$, there exists a neural network $f_{\text {mult}}$ with the network architecture $\left( \left\lceil \llceil \log _2 d\right\rceil \rrceil , 18 d\right) $ such that

$$\begin{aligned} f_{\text{ mult } }(x) {\left\{ \begin{array}{ll}>0, &{} \text{ if } x_{i}>0, \forall \ 0 \le i \le d, \\ =0, &{} \text{ otherwise. } \end{array}\right. } \end{aligned}$$

Proof

Based on Lemma 4 and Lemma 20 proposed by Kohler and Langer (2021), there exists a network $f_{sq}$ with the network architecture $\left( 1,18\right) $ satisfying $f_{sq}\left( m, n\right) =|m+n|-|m-n|$, for $m,n \in \left[ 0,1\right] $. Let

$$\begin{aligned} w=\left\lceil \llceil \log _2 d\right\rceil \rrceil , \ \left( z_1, \ldots , z_{2^w}\right) =\left( x^{1}, x^{2}, \ldots , x^{d}, 1, \ldots , 1\right) . \end{aligned}$$

In the first layer of $f_{\text {mult}}$, we compute

$$\begin{aligned} f_{sq}\left( z_1, z_2\right) , \ f_{sq}\left( z_3, z_4\right) , \ldots , f_{ q}\left( z_{2 w-1}, \ z_{2 q}\right) , \end{aligned}$$

which can be done by $18 \cdot 2^{w-1} \le 18 d$ neurons. The output of the first layer is a vector of length $2^{w-1}$. This process is repeated for the output vectors until the output is a one-dimensional vector. If $m=0$, then we have $f_{sq}\left( m, n\right) =|n|-|n|=0$. Based on mathematical induction, if a element of the vector x is equal to 0, $f_{\text{ mult } }(x)=0$. $\square $

Lemma 10

For $x \in \left[ 0,1\right] $, there exists a neural network $f_{\text {demo}}$ with the network architecture $\left( 3,2\right) $ such that

$$\begin{aligned} f_{\text {demo}}\left( x\right) ={\textbf{1}}_{\left( \frac{1}{c_2}, \infty \right) }\left( x\right) , \end{aligned}$$

where $c_2 \ge 1$ is a constant.

Proof

Let $f_{\text {demo}}\left( x\right) =-1 \sigma \left( -c_2\sigma \left( x\right) +1\right) +1$, where $\sigma \left( x\right) $ is the ReLU activation function, finishing the proof. $\square $

Lemma 11

Under Assumption 2, let $f\in {\mathcal {K}}(\lambda , J, {\mathcal {P}})$,where $\lambda , J \in {\mathbb {N}}$, ${\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}$. Then let $Q_i \in {\mathbb {N}}, \ i=1,\ldots ,\lambda $, sufficiently large, and there exist $D_{i }, \ p_{i}, \ q_{i}, \ R_{i}, \ i=1,\ldots ,\lambda ,$ and a neural network $f^\theta \in {\mathcal {F}} \left( L,r\right) $ with the property that

$$\begin{aligned} \Vert f(x)-f^\theta \left( x\right) \Vert _{2,\left[ -c_1,c_1\right] ^d} \le c_{3} c_1^{4\left( \max _{0 \le i \le \lambda }q_{i }+1\right) } \max _{0 \le i \le \lambda } Q_i^{-2 p_i}, \end{aligned}$$

where $c_3$ is a constant,

$$\begin{aligned} \begin{aligned} L =&\max _{1 \le i \le \lambda } \left[ 5 Q_i^{D_i}+\left\lceil \llceil \log _4\left( Q_i^{2 p_i+4{D_i} \left( q_i+1\right) } e^{4\left( q_i+1\right) \left( Q_i^{D_i}-1\right) }\right) \right\rceil \rrceil \right. \\&\left. \left\lceil \llceil \log _2(\max \{q_i, {D_i}\}+1)\right\rceil \rrceil +\left\lceil \llceil \log _4\left( Q_i^{2 p_i}\right) \right\rceil \rrceil \right] +4+\left\lceil \llceil \log _2 d\right\rceil \rrceil , \\ r =&\sum _{i=1}^\lambda 132 \cdot 2^{D_i} \left\lceil \llceil e^{D_i}\right\rceil \rrceil \left( \begin{array}{c}{D_i}+q_i \\ {D_i}\end{array}\right) \max \left\{ q_i+1, {D_i}^2\right\} . \end{aligned} \end{aligned}$$

Proof

According to Lemma 8, there exists a set of functions $f_i\left( x \right) $, $i=1,\ldots ,$ $\sum _{j=1}^J R_j,$ which is $\left( p_i,C \right) $-smooth, respectively, $p_i=q_i+s_i$ for some $q_i \in {\mathbb {N}}_0$ and $0<s_i \le 1$. Based on Theorem 2 introduced by Kohler and Langer (2021), there exists ${\hat{\phi }}_{1, i}$ with the network architecture $ \left( L_i,r_i\right) $ such that

$$\begin{aligned} \left\| f_i(x)-{\hat{\phi }}_{1, i}\left( x\right) \right\| _{\infty ,\left[ -c_1,c_1\right] ^d} \le c^{i}_{2} {c_1}^{4\left( q_{i}+1\right) } Q_i^{-2 p_i}, \end{aligned}$$

where $c_2^i$ is a constant,

$$\begin{aligned} \begin{aligned} L_i =\,&5 Q_i^{D_i}+\left\lceil \llceil \log _4\left( Q_i^{2 p_i+4{D_i} \left( q_i+1\right) } e^{4\left( q_i+1\right) \left( Q_i^{D_i}-1\right) }\right) \right\rceil \rrceil \left\lceil \llceil \log _2(\max \{q_i, {D_i}\}+1)\right\rceil \rrceil \\&+\left\lceil \llceil \log _4\left( Q_i^{2 p_i}\right) \right\rceil \rrceil , \\ r_i =\,&132 \cdot 2^{D_i} \left\lceil \llceil e^{D_i}\right\rceil \rrceil \left( \begin{array}{c}{D_i}+q_i \\ {D_i}\end{array}\right) \max \left\{ q_i+1, {D_i}^2\right\} . \end{aligned} \end{aligned}$$

Without loss of generality, for $1 \le u \le D_{i}$, let $ x_{i_1}=x_{u}$, $g_i=f_i+x_1$ and $\phi _{1, i}^{\prime }={\hat{\phi }}_{1, i}+x_{1}$,

$$\begin{aligned} \begin{aligned}&\left\| {\textbf{1}}_{\left\{ f_i>0\right\} }-{\textbf{1}}_{\left\{ {\hat{\phi }}_{1, i}>0\right\} }\right\| _{2,\left[ -c_1,c_1\right] ^d}^2 \\&\quad =\int \left( {\textbf{1}}_{\left\{ f_i>0\right\} }-{\textbf{1}}_{\left\{ {\hat{\phi }}_{1, i}>0\right\} }\right) ^2 d x \\&\quad =\int \int _{-c_1}^{c_1} {\textbf{1}}_{\left\{ f_i>0, {\hat{\phi }}_{1, i}\le 0\right\} }+{\textbf{1}}_{\left\{ f_i\le 0, {\hat{\phi }}_{1, i}>0\right\} } d x_{1} d x_{2} \ldots d x_{D_i}. \\ \end{aligned} \end{aligned}$$

For fixed $\left( x_{2}, \ldots , x_{D_i}\right) \in \left[ -c_1, c_1\right] ^{D_i-1}$, ${\textbf{1}}_{\left\{ f_i>0, {\hat{\phi }}_{1, i}<0\right\} }={\textbf{1}}_{\left[ \phi _{1, t}^{\prime }, g_i\right) }$. Thus,

$$\begin{aligned} \int _{-a}^a {\textbf{1}}_{\left\{ f_i>0, \phi _{1, i} \le 0\right\} } d x_{1} \le \left( g_i-\phi _{1, i}^{\prime }\right) \vee 0. \end{aligned}$$

Similarly,

$$\begin{aligned} \int _{-a}^a {\textbf{1}}_{\left\{ f_i \le 0, \phi _{1, i}>0\right\} } d x_{1} \le \left( \phi _{1, i}^{\prime }-g_i\right) \vee 0. \end{aligned}$$

Notice that $\left( b \vee 0\right) +\left( -b \vee 0\right) =|b|$, for any $x \in \left[ -c_1, c_1\right] ^d$, we have

$$\begin{aligned} \begin{aligned}&\int \int _{-c_1}^{c_1} {\textbf{1}}_{\left[ f_i>0, {\hat{\phi }}_{1, i} \le 0\right\} }+{\textbf{1}}_{\left\{ f_i \le 0, {\hat{\phi }}_{1, i}>0\right\} } d x_{1} d x_{2} \ldots d x_{D_i} \\&\quad \le \int \left( \left( g_i-\phi _{1, i}^{\prime }\right) \vee 0\right) +\left( \left( \phi _{1, i}^{\prime }-g_i\right) \vee 0\right) d x_{2} \ldots d x_{D_i} \\&\quad \le c_{4}^i {c_1}^{4\left( q_i+1\right) } Q_i^{-2 p_i}, \end{aligned} \end{aligned}$$

for a constant $c_4^i>0$.

Let

$$\begin{aligned} f^\theta \left( x \right) ={\textbf{1}}_{(0, \infty )} \left( \sum _{j=1}^J \left( \prod _{i=\sum _{i=1}^{j-1} R_i+1}^{\sum _{i=1}^j R_i} \left( {\textbf{1}}_{(0, \infty )} \left( {\hat{\phi }}_{1, i}\left( x\right) \right) \right) \right) \right) . \end{aligned}$$

By Lemma 9 and Lemma 10, given a sufficiently small $c_2$, $\prod \left( {\textbf{1}}_{(0, \infty )}\left( x \right) \right) $ and ${\textbf{1}}_{(0, \infty )}\left( x \right) $ can be implemented through networks $f_{\text{ mult } }(x)$ and $f_{\text {demo}}\left( x\right) $. Therefore, $f^\theta \in {\mathcal {F}} \left( L,r\right) $, where

$$\begin{aligned} \begin{aligned} L =&\max _{1 \le i \le \lambda } \left[ 5 Q_i^{D_i}+\left\lceil \llceil \log _4\left( Q_i^{2 p_i+4{D_i} \left( q_i+1\right) } e^{4\left( q_i+1\right) \left( Q_i^{D_i}-1\right) }\right) \right\rceil \rrceil \right. \\&\left. \left\lceil \llceil \log _2(\max \{q_i, {D_i}\}+1)\right\rceil \rrceil +\left\lceil \llceil \log _4\left( Q_i^{2 p_i}\right) \right\rceil \rrceil \right] +4+\left\lceil \llceil \log _2 d\right\rceil \rrceil , \\ r =&\sum _{i=1}^\lambda 132 \cdot 2^{D_i} \left\lceil \llceil e^{D_i}\right\rceil \rrceil \left( \begin{array}{c}{D_i}+q_i \\ {D_i}\end{array}\right) \max \left\{ q_i+1, {D_i}^2\right\} . \end{aligned} \end{aligned}$$

Moreover,

$$\begin{aligned} \begin{aligned}&\Vert f(x)-f^\theta \left( x\right) \Vert _{2,\left[ -c_1,c_1\right] ^d} \\&\quad \le \sum _{j=1}^J\left\| \prod _{i=\sum _{i=1}^{j-1} R_i+1}^{\sum _{i=1}^j R_i} I_{\left\{ f_i>0\right\} }(x)-\prod _{i=\sum _{i=1}^{j=-1} R_i+1}^{\sum _{i=1}^j R_i} I_{\left\{ {\hat{\phi }}_{1, i}>0\right\} }(x)\right\| _{2,\left[ -c_1,c_1\right] ^d}\\&\quad \le c_{5} \max _{1 \le i \le \lambda }\left\| f_i(x)-{\hat{\phi }}_{1, i}\left( x \right) \right\| _{2,\left[ -c_1,c_1\right] ^d} \\&\quad \le c_{3} a^{4\left( \max _{1 \le i \le \lambda }q_{i}+1\right) } \max _{1 \le i \le \lambda } Q_i^{-2 p_i}, \end{aligned} \end{aligned}$$

where $c_3,c_5$ are constants. $\square $

Remark 4

Lemma 11 illustrates that a sufficiently deep neural network can approximate the decision function very well. Based on Theorem 2 introduced by Kohler and Langer (2021), a sufficiently wide neural network can also yield similar results. More specifically, replacing L and r with

$$\begin{aligned} \begin{aligned} L =\,&9+\left\lceil \llceil \log _4\left( \max _{1 \le i \le \lambda } Q_i^{2 p_i}\right) \right\rceil \rrceil \left( \left\lceil \llceil \log _2\left( \max _{1 \le i \le \lambda }\left\{ D_{i }, q_{i }\right\} +1\right) \right\rceil \rrceil +1\right) \\&+\left\lceil \llceil \log _2 \max _{1 \le j \le J}R_{j }\right\rceil \rrceil , \\ r =\,&\sum _{i=1}^\lambda 2^{D_i} 64\left( \begin{array}{c} D_i+q_i \\ D_i \end{array}\right) D_i^2\left( q_i+1\right) Q_i^{D_i}, \end{aligned} \end{aligned}$$

the Lemma 11 still holds.

Appendix B: Flow chart and pseudocode for DLSSPM

In Fig. 2, a flow chart is given to help understand the main steps in the implementation of DLSSPM. Detailed pseudocode for the DLSSPM implementation for multi-asset maximum call options is also provided, where underlying assets follow correlated geometric Brownian motions. This pseudocode is given based on the following assumptions:

The time to maturity is uniformly divided into N intervals
Points in a Sobol sequence are available by calls to the function Sobol
Random standard normal numbers are available by calls to the function RSN
Inverse to a given standard normal is available by calls to the function InvN
Cholesky factorization can be implemented by calls to the function Cholesky
Gradient-based optimal parameters are available by calls to the function GBP
Discount factors are available by calls to the function Discount
Finding the position of the closest element in the sequence to a given number can be achieved by calls to the function Near
Fully connected deep neural networks with ReLU as the activation function, where ${\textbf{1}}_{\left( 0,\infty \right) }$ is applied to the output, are available by calls to the function DNNs

The list of data variables used in the pseudocodes are as follows:

T is the maturity time of the option
K is the strike price of the option
d is the number of underlying assets for the option
$S_0$ is a d-vector holding initial asset prices
$\sigma $ is a d-vector holding asset volatilities
$\delta $ is a d-vector holding dividend yields
n is the number of simulated price paths for training
N is the number of uniform time intervals for [0, T]
$K_L$ is the number of simulated price paths for obtaining an option estimate
a is a N-vector holding the number of bundles for each time node
$\rho $ is a $d \times d$ array holding the correlation matrix
r is the interest rate
$\alpha $ is the learning rate
L is a N-vector holding the length of the deep neural networks for each time node
$\lambda $ is a $N \times \left( \max a\right) $ array for holding the edges of the deep neural networks for each time node and each bundle
S is a $n \times N \times d$ array containing the Monte Carlo simulated price paths
${\widetilde{S}}$ is a $\left( \max a\right) \times N \times d$ array containing the Sobol-based Monte Carlo simulated price paths
${\overline{S}}$ is a $K_L \times N \times d$ array containing the Monte Carlo simulated price paths
h is a $n \times N$ array for holding the dimension-reduced version of the Monte Carlo simulated price paths S
${\widetilde{h}}$ is a $\left( \max a\right) \times N$ array for holding the dimension-reduced version of the Sobol-based Monte Carlo simulated price paths ${\widetilde{S}}$
${\overline{h}}$ is a $K_L \times N$ array for holding the dimension-reduced version of the Monte Carlo simulated price paths ${\overline{S}}$
p is a $n \times N$ array for payoffs at each price point of Monte Carlo simulated price paths S
${\overline{p}}$ is a $K_L \times N$ array for payoffs at each price point of Monte Carlo simulated price paths ${\overline{S}}$
${\mathcal {A}}$ is a $n \times N$ array for holding the path indexes belonging to bundles for the Monte Carlo simulated price paths S
$\overline{{\mathcal {A}}}$ is a $K_L \times N$ array for holding the path indexes belonging to bundles for the Monte Carlo simulated price paths ${\overline{S}}$
$\tau $ is a $n \times N$ array holding the estimated optimal stopping time along each Monte Carlo simulated price paths S and each time node
${\overline{\tau }}$ is a $K_L$-vector holding the estimated optimal stopping time along each Monte Carlo simulated price paths ${\overline{S}}$ at the initial moment
$\theta $ is a $N \times \left( \max a\right) $ array holding pointers to the deep neural networks for each bundle and each time node
V is the option price estimator

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, Y., Zheng, X. A deep learning method for pricing high-dimensional American-style options via state-space partition. Comp. Appl. Math. 43, 152 (2024). https://doi.org/10.1007/s40314-024-02660-3

Download citation

Received: 04 June 2023
Revised: 07 December 2023
Accepted: 25 February 2024
Published: 02 April 2024
DOI: https://doi.org/10.1007/s40314-024-02660-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep learning method for pricing high-dimensional American-style options via state-space partition

Abstract

Access this article

Similar content being viewed by others

Lookback option pricing under the double Heston model using a deep learning algorithm

A Deep Learning Based Numerical PDE Method for Option Pricing

Deep ReLU network expression rates for option prices in high-dimensional, exponential Lévy models

Data Availability

References

Acknowledgements

Funding