From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization

Song, Zitao; Wang, Yining; Qian, Pin; Song, Sifan; Coenen, Frans; Jiang, Zhengyong; Su, Jionglong

doi:10.1007/s10489-022-04217-5

From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization

Published: 11 November 2022

Volume 53, pages 15188–15203, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zitao Song¹,
Yining Wang¹,
Pin Qian²,
Sifan Song¹,
Frans Coenen³,
Zhengyong Jiang¹ &
…
Jionglong Su ORCID: orcid.org/0000-0001-5360-6493⁴

3983 Accesses
1 Altmetric
Explore all metrics

Abstract

As a fundamental problem in algorithmic trading, portfolio optimization aims to maximize the cumulative return by continuously investing in various financial derivatives within a given time period. Recent years have witnessed the transformation from traditional machine learning trading algorithms to reinforcement learning algorithms due to their superior nature of sequential decision making. However, the exponential growth of the imperfect and noisy financial data that is supposedly leveraged by the deterministic strategy in reinforcement learning, makes it increasingly challenging for one to continuously obtain a profitable portfolio. Thus, in this work, we first reconstruct several deterministic and stochastic reinforcement algorithms as benchmarks. On this basis, we introduce a risk-aware reward function to balance the risk and return. Importantly, we propose a novel interpretable stochastic reinforcement learning framework which tailors a stochastic policy parameterized by Gaussian Mixtures and a distributional critic realized by quantiles for the problem of portfolio optimization. In our experiment, the proposed algorithm demonstrates its superior performance on U.S. market stocks with a 63.1% annual rate of return while at the same time reducing the market value max drawdown by 10% when back-testing during the stock market crash around March 2020.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Meta Algorithms for Portfolio Optimization Using Reinforcement Learning

Reinforcement Learning for Portfolio Selection in the Vietnamese Market

A taxonomy of literature reviews and experimental study of deepreinforcement learning in portfolio management

Article Open access 17 January 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://www.wind.com.cn/
In our work, we focus on these 22 stocks for ease of explanation. This framework is also applicable to other portfolios.

References

Haugen RA (2000) Modern investment theory, 5th edn. Prentice Hall, Englewood Cliffs
Google Scholar
Heaton JB, Polson NG, Witte JH (2016) Deep learning for finance: deep portfolios. Appl Stoch Model Bus Ind 33(1):3–12
Article MathSciNet MATH Google Scholar
Niaki STA, Hoseinzade S (2013) Forecasting s&p 500 index using artificial neural networks and design of experiments. J Ind Eng Int 9(1):1–9
Article Google Scholar
Freitas FD, Souza AFD, de Almeida AR (2009) Prediction-based portfolio optimization model using neural networks. Neurocomputing 72(10):2155–2170
Article Google Scholar
Fama E (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25 (2):383–417. https://doi.org/10.2307.2325486
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche GVD, Schrittwieser J, Antonoglou I, Panneershelvam V, Marc Lanctot EA (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Neftci EO, Averbeck BB (2019) Reinforcement learning in artificial and biological systems. Nature Machine Intelligence 1. https://doi.org/10.1038/s42256-019-0025-4
Lucarelli G, Borrotti M (2020) A deep q-learning portfolio management framework for the cryptocurrency market. Neural Comput Appl 32:17229–17244. https://doi.org/10.1007/s00521-020-05359-8 https://doi.org/10.1007/s00521-020-05359-8
Article Google Scholar
Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. AAAI’16, pp 2094–2100. AAAI Press, Phoenix, Arizona
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, pp 1995–2003. PMLR
Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv:1706.10059
Silver D, Lever G, Heess N, Thomas Degris DW, Riedmiller M (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st International conference on machine learning (ICML-14), pp 387–395
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: ICLR (Poster). arXiv:1509.02971
Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. arXiv:1802.09477. https://arxiv.org/abs/1802.09477
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, pp 1861–1870. PMLR
Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2018) Soft actor-critic algorithms and applications. arXiv:1812.05905
Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. arXiv:1707.06887. https://arxiv.org/abs/1707.06887
Dabney W, Rowland M, Bellemare MG, Munos R (2017) Distributional reinforcement learning with quantile regression. arXiv:1710.10044. https://arxiv.org/abs/1710.10044
Liang Z, Chen H, Zhu J, Jiang K, Li Y (2018) Adversarial deep reinforcement learning in portfolio management. arXiv:1808.09940
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347. https://arxiv.org/abs/1707.06347
Wang R, Wei H, An B, Feng Z, Yao J (2021) Deep stock trading:, a hierarchical reinforcement learning framework for portfolio optimization and order execution. arXiv:2012.12620
Fang Y, Ren K, Liu W, Zhou D, Zhang W, Bian J, Yu Y, Liu T-Y (2021) Universal trading for order execution with oracle policy distillation. arXiv:2103.10860
Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2015) Policy distillation. arXiv:1511.06295
Yu P, Lee JS, Kulyatin I, Shi Z, Dasgupta S (2019) Model-based deep reinforcement learning for dynamic portfolio optimization. arXiv:1901.08740. https://arxiv.org/abs/1901.08740
Chow Y, Tamar A, Mannor S, Pavone M (2015) Risk-sensitive and robust decision-making:, a cvar optimization approach. arXiv:1506.02188
Stanko S, Macek K (2019) Risk-averse distributional reinforcement learning: a cvar optimization approach. In: IJCCI, pp 412–423
Markowitz HM (1968) Portfolio selection yale university press
Longerstaey J, Spencer M (1996) Riskmetricstm—technical document, vol 51. Morgan Guaranty Trust Company of New York, New York, p 54
Google Scholar
Rockafellar RT, Uryasev S et al (2000) Optimization of conditional value-at-risk. J Risk 2:21–42
Article Google Scholar
Chen Y, Wei Z, Huang X (2018) Incorporating corporation relationship via graph convolutional neural networks for stock price prediction. In: Proceedings of the 27th ACM International conference on information and knowledge management. CIKM ’18, pp 1655–1658. Association for Computing Machinery. https://doi.org/10.1145/3269206.3269269
Ding X, Zhang Y, Liu T, Duan J (2016) Knowledge-driven event embedding for stock prediction. In: Proceedings of COLING 2016, the 26th International conference on computational linguistics: technical papers, pp 2133–2142. The COLING 2016 Organizing Committee. https://www.aclweb.org/anthology/C16-1201
Wang J, Zhang Y, Tang K, Wu J, Xiong Z (2019) Alphastock: a buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks. In: Proceedings of the 25th ACM SIGKDD International conference on knowledge discovery & data mining, pp 1900–1908
Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 538:142–158. https://doi.org/10.1016/j.ins.2020.05.066 https://doi.org/10.1016/j.ins.2020.05.066
Article MathSciNet Google Scholar
Li B, Hoi SCH (2014) Online portfolio selection: a survey. ACM Comput Surv 46(3). https://doi.org/10.1145/2512962
Blum A, Kalai A (1999) Universal portfolios with and without transaction costs. Mach Learn 35:193–205
Article MATH Google Scholar
Györfi L, Vajda I (2008) Growth optimal investment with transaction costs. In: Freund Y, Györfi L, Turán G, Zeugmann T (eds) Algorithmic Learning Theory, pp 108–122. Springer, Berlin, Heidelberg
Moody J, Wu L, Liao Y, Saffell M (1998) Performance functions and reinforcement learning for trading systems and portfolios. J Forecast 17:441–470
Article Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction MIT press
Harville DA (1998) Matrix algebra from a statistician’s perspective. Springer, New York. https://doi.org/10.1007/b98818
MATH Google Scholar
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
DiMartino D, Duca JV (2007) The rise and fall of subprime mortgages. Economic Letter, vol 2. https://EconPapers.repec.org/RePEc:fip:feddel:y:2007:i:nov:n:v.2no.11
Organization WH et al (2020) Naming the coronavirus disease (covid-19) and the virus that causes it. Brazilian Journal of Implantology and Health Sciences 2(3)
Investopedia (n.d.) In Investopedia.com financial-term-dictionary. Retrieved June, 2021, from https://www.investopedia.com/financial-term-dictionary-4769738
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. ICML’17, pp 3319–3328 JMLR.org

Download references

Author information

Authors and Affiliations

Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, China
Zitao Song, Yining Wang, Sifan Song & Zhengyong Jiang
Department of Computer Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, China
Pin Qian
Department of Computer Sciences, University of Liverpool, Liverpool, UK
Frans Coenen
School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool University, Suzhou, China
Jionglong Su

Authors

Zitao Song
View author publications
You can also search for this author inPubMed Google Scholar
Yining Wang
View author publications
You can also search for this author inPubMed Google Scholar
Pin Qian
View author publications
You can also search for this author inPubMed Google Scholar
Sifan Song
View author publications
You can also search for this author inPubMed Google Scholar
Frans Coenen
View author publications
You can also search for this author inPubMed Google Scholar
Zhengyong Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Jionglong Su
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Zhengyong Jiang or Jionglong Su.

Ethics declarations

Conflict of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zitao Song and Yining Wang are contributed equally to this work.

Appendices

Appendix A:: Mathematical details

1.1 A.1 Computing the determinant of the jacobin matrix

For $f: \mathbb {R}^{h} \to \mathbb {R}^{h}$ from the main manuscript (3), where h is the dimension of actions, we let a = f(x), and the Jacobin of this function is:

$$ J_{f}(x,\tau) = \frac{1}{\tau} \begin{pmatrix} a_{1}-{a_{1}^{2}} & -a_{1}a_{2} &{\cdots} & -a_{1}a_{h} \\ -a_{2}a_{1} & a_{2}-{a_{2}^{2}} &{\cdots} & -a_{2}a_{h} \\ {\vdots} & {\vdots} &{\ddots} & {\vdots} \\ -a_{h}a_{1} & -a_{h}z_{2} & {\cdots} & a_{h}-{a_{h}^{2}} \end{pmatrix}, $$

if we define v = (a₁,a₂,⋯ ,a_h)^T and D = diag(a), then we have:

$$ \begin{array}{@{}rcl@{}} \det(J_{f}(x,\tau)) &=& \det (\frac{1}{\tau}(D-vv^{T})) \\ &=&(\frac{1}{\tau})^{h} \cdot (1-v^{T}D^{-1}v) \cdot \det D \quad\quad \text{by the Matrix Determinant Lemma } \\ &=&(\frac{1}{\tau})^{h} \cdot (1-\sum\limits_{i=1}^{h}a_{i}) \cdot \prod\limits_{i=1}^{h}a_{i} \quad\quad\quad\quad \text{by the property of matrix } D \\ &=&(\frac{1}{\tau})^{h} \cdot \delta \cdot \prod\limits_{i=1}^{h}a_{i}\quad\quad\quad\quad\quad \qquad \text{by} \sum\limits_{i=1}^{h}a_{i} \approx 1 \end{array} $$

1.2 A.2 Computing the lower bound of the log probability

According to the formula of the transformation of random variables, we have $p_{\mathcal {A}}(a) = p_{\mathcal {A}^{\prime }}(a^{\prime })\vert \det J_{f}(a^{\prime },\tau )\vert ^{-1}$, if we let $p_{\mathcal {A}}(a):=\pi _{\theta }(a_{t}\vert s_{t})$, then its log-likelihood can be written as:

$$ \begin{array}{@{}rcl@{}} \log\pi_{\theta}(a_{t}\vert s_{t}) &=& \log p_{\mathcal{A}^{\prime}}(a_{t}^{\prime})-\log \vert \det J_{f}(a_{t}^{\prime},\tau)\vert \\ &=& \log p_{\mathcal{A}^{\prime}}(a_{t}^{\prime})+h\log(\tau)-\log(1-{\sum\limits_{i}^{h}}{a_{t}^{i}})-{\sum\limits_{i}^{h}}\log({a_{t}^{i}}) \\ &=& \log p_{\mathcal{A}^{\prime}}(a_{t}^{\prime})+h\log(\tau)-\log(\delta)-{\sum\limits_{i}^{h}}\log({a_{t}^{i}}) \\ &\gg& \log p_{\mathcal{A}^{\prime}}(a_{t}^{\prime})+h\log(\tau)-{\sum\limits_{i}^{h}}\log({a_{t}^{i}}) \end{array} $$

Therefore, the lower bound of the transformed log likelihood on a simplex region is $\log p_{\mathcal {A}^{\prime }}(a_{t}^{\prime })+h\log (\tau )-{{\sum }_{i}^{h}}\log ({a_{t}^{i}})$.

Appendix B: Supplementary tables

Table 4 Abbreviations and Full names of the used 22 stocks in the U.S. market

Full size table

Table 5 Abbreviations and explanations of the used nine features

Full size table

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Song, Z., Wang, Y., Qian, P. et al. From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization. Appl Intell 53, 15188–15203 (2023). https://doi.org/10.1007/s10489-022-04217-5

Download citation

Accepted: 28 September 2022
Published: 11 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04217-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From deterministic to stochastic: an interpretable stochastic model-free reinforcement learning framework for portfolio optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Meta Algorithms for Portfolio Optimization Using Reinforcement Learning

Reinforcement Learning for Portfolio Selection in the Vietnamese Market

A taxonomy of literature reviews and experimental study of deepreinforcement learning in portfolio management

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendices

Appendix A:: Mathematical details

1.1 A.1 Computing the determinant of the jacobin matrix

1.2 A.2 Computing the lower bound of the log probability

Appendix B: Supplementary tables

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now