Abstract
This paper proposes a deep learning approach for solving optimal stopping problems and high-dimensional American-style options pricing problems. Through state-space partition, the method does not require recalculation of the structure of networks when the price of the asset changes, which makes tracking valuation more efficient. This paper also offers theoretical proof for the existence of a deep learning network that can determine the optimal stopping time via state-space partition. We present convergence proofs for the estimators and also test the method on Bermuda max-call options as examples.
Similar content being viewed by others
Data Availability
Data available on request from the authors. The data that support this study are available from the corresponding author upon reasonable request.
References
Aliprantis CD, Border KC (2006) Infinite dimensional analysis. Springer, Berlin. https://doi.org/10.1007/3-540-29587-9
Bally V, Pagès G, Printems J (2003) First-order schemes in the numerical quantization method. Math Financ 13(1):1–16. https://doi.org/10.1111/1467-9965.t01-1-00002
Barraquand J, Martineau D (1995) Numerical valuation of high-dimensional multivariate American securities. J Financ Quant Anal 30(3):383–405. https://doi.org/10.2307/2331347
Becker S, Cheridito P, Jentzen A (2019) Deep optimal stopping. J Mach Learn Res 20:2712–2736
Belomestny D (2011) On the rates of convergence of simulation-based optimization algorithms for optimal stopping problems. Ann Appl Probab 21(1):215–239. https://doi.org/10.1214/10-AAP692
Belomestny D, Schoenmakers J, Dickmann F (2013) Multilevel dual approach for pricing American style derivatives. Financ Stoch 17(4):717–742. https://doi.org/10.1007/s00780-013-0208-5
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311. https://doi.org/10.1137/16M1080173
Boyle PP (1977) Options: a monte Carlo approach. J Financ Econ 4(3):323–338. https://doi.org/10.1016/0304-405X(77)90005-8
Broadie M, Cao M (2008) Improved lower and upper bound algorithms for pricing American options by simulation. Quant Financ 8(8):845–861. https://doi.org/10.1080/14697680701763086
Caccetta L, Qu B, Zhou G (2011) A globally and quadratically convergent method for absolute value equations. Comput Optim Appl 48(1):45–58. https://doi.org/10.1007/s10589-009-9242-9
Chen N, Glasserman P (2007) Additive and multiplicative duals for American option pricing. Financ Stoch 11(2):153–179. https://doi.org/10.1007/s00780-006-0031-3
Chernogorova TP, Koleva MN, Valkov RL (2018) A two-grid penalty method for American options. Comput Appl Math 37(3):2381–2398. https://doi.org/10.1007/s40314-017-0457-6
Cox JC, Ross SA, Rubinstein M (1979) Option pricing: a simplified approach. J Financ Econ 7(3):229–263. https://doi.org/10.1016/0304-405X(79)90015-1
Dudley RM (1974) Metric entropy of some classes of sets with differentiable boundaries. J Approx Theory 10(3):227–236. https://doi.org/10.1016/0021-9045(74)90120-8
Geske R, Roll R (1984) On valuing American call options with the black-Scholes European formula. J Financ 39(2):443–455. https://doi.org/10.1111/j.1540-6261.1984.tb02319.x
Haugh MB, Kogan L (2004) Pricing American options: a duality approach. Oper Res 52(2):258–270. https://doi.org/10.1287/opre.1030.0070
Imaizumi M, Fukumizu K (2019) Deep neural networks learn non-smooth functions effectively. In: Proceedings of Machine Learning Research, 22nd International Conference on Artificial Intelligence and Statistics, Naha. 89:869–878
Jain S, Oosterlee CW (2012) Pricing high-dimensional Bermudan options using the stochastic grid method. Int J Comput Math 89(9):1186–1211. https://doi.org/10.1080/00207160.2012.690035
Jin X, Tan HH, Sun J (2007) A state-space partitioning method for pricing high-dimensional American-style options. Math Financ 17(3):399–426. https://doi.org/10.1111/j.1467-9965.2007.00309.x
Jin X, Li X, Tan HH, Wu Z (2013) A computationally efficient state-space partitioning approach to pricing high-dimensional American options via dimension reduction. Eur J Oper Res 231(2):362–370. https://doi.org/10.1016/j.ejor.2013.05.035
Kohler M, Langer S (2021) On the rate of convergence of fully connected deep neural network regression estimates. Ann Statist 49(4):2231–2249. https://doi.org/10.1214/20-AOS2034
Kohler M, Krzyzak A, Todorovic N (2010) Pricing of high-dimensional American options by neural networks. Math Financ 20(3):383–410. https://doi.org/10.1111/j.1467-9965.2010.00404.x
Leshno M, Lin VY, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6(6):861–867. https://doi.org/10.1016/S0893-6080(05)80131-5
Li X, Wu Z (2006) A semi-analytic method for valuing high-dimensional options on the maximum and minimum of multiple assets. Ann Financ 2(2):179–205. https://doi.org/10.1007/s10436-005-0034-7
Longstaff FA, Schwartz ES (2001) Valuing American options by simulation: a simple least-squares approach. Rev Financ Stud 14(1):113–147. https://doi.org/10.1093/rfs/14.1.113
Merton RC, Brennan MJ, Schwartz ES (1977) The valuation of American put options. J Financ 32(2):449–462. https://doi.org/10.1111/j.1540-6261.1977.tb03284.x
Rogers LCG (2002) Monte Carlo valuation of American options. Math Financ 12(3):271–286. https://doi.org/10.1111/1467-9965.02010
Rogers LCG (2010) Dual valuation and hedging of Bermudan options. SIAM J Financ Math 1(1):604–608. https://doi.org/10.1137/090772198
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065. https://doi.org/10.1109/ACCESS.2019.2912200
Sirignano J, Spiliopoulos K (2018) Dgm: a deep learning algorithm for solving partial differential equations. J Comput Phys 375:1339–1364. https://doi.org/10.1016/j.jcp.2018.08.029
Tilley JA (1993) Valuing American options in a path simulation model. Trans Soc 45:83–104
Tsitsiklis JN, Van Roy B (2001) Regression methods for pricing complex American-style options. IEEE Trans Neural Netw 12(4):694–703. https://doi.org/10.1109/72.935083
Acknowledgements
We thank Professor Xun Li for his valuable comments on the idea of the paper. We thank the associate editor and the reviewers for their helpful feedback that improved this paper. This work is partially supported by the National Key R &D Program of China (grant no. 2023YFA1009200) and the National Natural Science Foundation of China (grant no. 11871244).
Funding
National Key R &D Program of China (2023YFA1009200), National Natural Science Foundation of China (11871244).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Structure of deep neural networks and proof of Lemmas
Since the stopping decision functions \(f_i^j, j=1, \ldots , a_i, i=0,1, \ldots , N-1,\) can only take discrete values, based on the notion of piecewise smooth functions proposed by Imaizumi and Fukumizu (2019) and boundary fragment classes developed by Dudley (1974), we introduce a special class of functions that take 1 on some regions of the state space whose boundaries are formed by a series of smooth functions. For that purpose, we introduce the following definition of \(\left( p,C\right) \)-smoothness and edge function.
Definition 1
Let \(p=q+s\) for some \(q \in {\mathbb {N}}_0\) and \(0<s \le 1\). A function \(m: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) is called \(\left( p, C\right) \)-smooth, if for every \(\alpha =\left( \alpha _1, \ldots , \alpha _d\right) \in {\mathbb {N}}_0^d\) with \(\sum _{j=1}^d \alpha _j=q\), the partial derivative \(\partial ^q m /\left( \partial x_1^{\alpha _1} \ldots \partial x_d^{\alpha _d}\right) \) exists and satisfies
for all \({\textbf{x}}, {\textbf{z}} \in {\mathbb {R}}^d\), where \(\Vert \cdot \Vert \) denotes the Euclidean norm.
Definition 2
Let \(D \in \{1, \ldots , d\}\), \(x \in {\mathbb {R}}^d\). A function is called D-edge function, if it is a element of the set
For \(\lambda , J, R \in {\mathbb {N}}\) and \({\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}\), we also define
Obviously, the functions in the \({\mathcal {K}}\left( \lambda , J, {\mathcal {P}}\right) \) take 1 for some blocks and 0 for the rest of the regions and the functions themselves are not differentiable or even discontinuous at the edges of these regions. Thus, \(f_{i}^{j}\left( x\right) \in {\mathcal {K}}\left( \lambda _i, J_i, {\mathcal {P}}\right) \), where \(\lambda _i, \ J_i \in {\mathbb {N}}\) and \({\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}\). It is worth noting that each edge function in the definition of \(f_i^j(x)\) can have a different smoothness \(p_{n, i}=\) \(q_{n, i}+s_{n, i}\) and a different input dimension \(D_{n, i}\), where \(0 \le n \le N, \ 1 \le i \le \lambda _n, \ \left( p_{n, i}, D_{n, i}\right) \in {\mathcal {P}}\).
Lemma 8
Let \(\lambda , J \in {\mathbb {N}}\), \({\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}\). For arbitrary \(f\left( x \right) \in {\mathcal {K}}\left( \lambda , J, {\mathcal {P}}\right) \), there exists a set of \(\left( p, C\right) \)-smooth functions \(f_i\left( x \right) , i=1,\ldots ,\) \(\sum _{j=1}^J R_j\), such that
Proof
Using the definition of \({\mathcal {K}}\left( \lambda , J, {\mathcal {P}}\right) \), the proof of Lemma 8 can be obtained directly. \(\square \)
Lemma 9
For \(x \in \left[ 0,1\right] ^d\), there exists a neural network \(f_{\text {mult}}\) with the network architecture \(\left( \left\lceil \llceil \log _2 d\right\rceil \rrceil , 18 d\right) \) such that
Proof
Based on Lemma 4 and Lemma 20 proposed by Kohler and Langer (2021), there exists a network \(f_{sq}\) with the network architecture \(\left( 1,18\right) \) satisfying \(f_{sq}\left( m, n\right) =|m+n|-|m-n|\), for \(m,n \in \left[ 0,1\right] \). Let
In the first layer of \(f_{\text {mult}}\), we compute
which can be done by \(18 \cdot 2^{w-1} \le 18 d\) neurons. The output of the first layer is a vector of length \(2^{w-1}\). This process is repeated for the output vectors until the output is a one-dimensional vector. If \(m=0\), then we have \(f_{sq}\left( m, n\right) =|n|-|n|=0\). Based on mathematical induction, if a element of the vector x is equal to 0, \(f_{\text{ mult } }(x)=0\). \(\square \)
Lemma 10
For \(x \in \left[ 0,1\right] \), there exists a neural network \(f_{\text {demo}}\) with the network architecture \(\left( 3,2\right) \) such that
where \(c_2 \ge 1\) is a constant.
Proof
Let \(f_{\text {demo}}\left( x\right) =-1 \sigma \left( -c_2\sigma \left( x\right) +1\right) +1\), where \(\sigma \left( x\right) \) is the ReLU activation function, finishing the proof. \(\square \)
Lemma 11
Under Assumption 2, let \(f\in {\mathcal {K}}(\lambda , J, {\mathcal {P}})\),where \(\lambda , J \in {\mathbb {N}}\), \({\mathcal {P}} \subseteq [1, \infty ) \times {\mathbb {N}}\). Then let \(Q_i \in {\mathbb {N}}, \ i=1,\ldots ,\lambda \), sufficiently large, and there exist \(D_{i }, \ p_{i}, \ q_{i}, \ R_{i}, \ i=1,\ldots ,\lambda ,\) and a neural network \(f^\theta \in {\mathcal {F}} \left( L,r\right) \) with the property that
where \(c_3\) is a constant,
Proof
According to Lemma 8, there exists a set of functions \(f_i\left( x \right) \), \(i=1,\ldots ,\) \(\sum _{j=1}^J R_j,\) which is \(\left( p_i,C \right) \)-smooth, respectively, \(p_i=q_i+s_i\) for some \(q_i \in {\mathbb {N}}_0\) and \(0<s_i \le 1\). Based on Theorem 2 introduced by Kohler and Langer (2021), there exists \({\hat{\phi }}_{1, i}\) with the network architecture \( \left( L_i,r_i\right) \) such that
where \(c_2^i\) is a constant,
Without loss of generality, for \(1 \le u \le D_{i}\), let \( x_{i_1}=x_{u}\), \(g_i=f_i+x_1\) and \(\phi _{1, i}^{\prime }={\hat{\phi }}_{1, i}+x_{1}\),
For fixed \(\left( x_{2}, \ldots , x_{D_i}\right) \in \left[ -c_1, c_1\right] ^{D_i-1}\), \({\textbf{1}}_{\left\{ f_i>0, {\hat{\phi }}_{1, i}<0\right\} }={\textbf{1}}_{\left[ \phi _{1, t}^{\prime }, g_i\right) }\). Thus,
Similarly,
Notice that \(\left( b \vee 0\right) +\left( -b \vee 0\right) =|b|\), for any \(x \in \left[ -c_1, c_1\right] ^d\), we have
for a constant \(c_4^i>0\).
Let
By Lemma 9 and Lemma 10, given a sufficiently small \(c_2\), \(\prod \left( {\textbf{1}}_{(0, \infty )}\left( x \right) \right) \) and \({\textbf{1}}_{(0, \infty )}\left( x \right) \) can be implemented through networks \(f_{\text{ mult } }(x)\) and \(f_{\text {demo}}\left( x\right) \). Therefore, \(f^\theta \in {\mathcal {F}} \left( L,r\right) \), where
Moreover,
where \(c_3,c_5\) are constants. \(\square \)
Remark 4
Lemma 11 illustrates that a sufficiently deep neural network can approximate the decision function very well. Based on Theorem 2 introduced by Kohler and Langer (2021), a sufficiently wide neural network can also yield similar results. More specifically, replacing L and r with
the Lemma 11 still holds.
Appendix B: Flow chart and pseudocode for DLSSPM
In Fig. 2, a flow chart is given to help understand the main steps in the implementation of DLSSPM. Detailed pseudocode for the DLSSPM implementation for multi-asset maximum call options is also provided, where underlying assets follow correlated geometric Brownian motions. This pseudocode is given based on the following assumptions:
-
The time to maturity is uniformly divided into N intervals
-
Points in a Sobol sequence are available by calls to the function Sobol
-
Random standard normal numbers are available by calls to the function RSN
-
Inverse to a given standard normal is available by calls to the function InvN
-
Cholesky factorization can be implemented by calls to the function Cholesky
-
Gradient-based optimal parameters are available by calls to the function GBP
-
Discount factors are available by calls to the function Discount
-
Finding the position of the closest element in the sequence to a given number can be achieved by calls to the function Near
-
Fully connected deep neural networks with ReLU as the activation function, where \({\textbf{1}}_{\left( 0,\infty \right) }\) is applied to the output, are available by calls to the function DNNs
The list of data variables used in the pseudocodes are as follows:
-
T is the maturity time of the option
-
K is the strike price of the option
-
d is the number of underlying assets for the option
-
\(S_0\) is a d-vector holding initial asset prices
-
\(\sigma \) is a d-vector holding asset volatilities
-
\(\delta \) is a d-vector holding dividend yields
-
n is the number of simulated price paths for training
-
N is the number of uniform time intervals for [0, T]
-
\(K_L\) is the number of simulated price paths for obtaining an option estimate
-
a is a N-vector holding the number of bundles for each time node
-
\(\rho \) is a \(d \times d\) array holding the correlation matrix
-
r is the interest rate
-
\(\alpha \) is the learning rate
-
L is a N-vector holding the length of the deep neural networks for each time node
-
\(\lambda \) is a \(N \times \left( \max a\right) \) array for holding the edges of the deep neural networks for each time node and each bundle
-
S is a \(n \times N \times d\) array containing the Monte Carlo simulated price paths
-
\({\widetilde{S}}\) is a \(\left( \max a\right) \times N \times d\) array containing the Sobol-based Monte Carlo simulated price paths
-
\({\overline{S}}\) is a \(K_L \times N \times d\) array containing the Monte Carlo simulated price paths
-
h is a \(n \times N\) array for holding the dimension-reduced version of the Monte Carlo simulated price paths S
-
\({\widetilde{h}}\) is a \(\left( \max a\right) \times N\) array for holding the dimension-reduced version of the Sobol-based Monte Carlo simulated price paths \({\widetilde{S}}\)
-
\({\overline{h}}\) is a \(K_L \times N\) array for holding the dimension-reduced version of the Monte Carlo simulated price paths \({\overline{S}}\)
-
p is a \(n \times N\) array for payoffs at each price point of Monte Carlo simulated price paths S
-
\({\overline{p}}\) is a \(K_L \times N\) array for payoffs at each price point of Monte Carlo simulated price paths \({\overline{S}}\)
-
\({\mathcal {A}}\) is a \(n \times N\) array for holding the path indexes belonging to bundles for the Monte Carlo simulated price paths S
-
\(\overline{{\mathcal {A}}}\) is a \(K_L \times N\) array for holding the path indexes belonging to bundles for the Monte Carlo simulated price paths \({\overline{S}}\)
-
\(\tau \) is a \(n \times N\) array holding the estimated optimal stopping time along each Monte Carlo simulated price paths S and each time node
-
\({\overline{\tau }}\) is a \(K_L\)-vector holding the estimated optimal stopping time along each Monte Carlo simulated price paths \({\overline{S}}\) at the initial moment
-
\(\theta \) is a \(N \times \left( \max a\right) \) array holding pointers to the deep neural networks for each bundle and each time node
-
V is the option price estimator
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, Y., Zheng, X. A deep learning method for pricing high-dimensional American-style options via state-space partition. Comp. Appl. Math. 43, 152 (2024). https://doi.org/10.1007/s40314-024-02660-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40314-024-02660-3