Deep reinforcement learning for dynamic strategy interchange in financial markets

Zhong, Xingyu; Wei, Jinhui; Li, Siyuan; Xu, Qingzhen

doi:10.1007/s10489-024-05965-2

Deep reinforcement learning for dynamic strategy interchange in financial markets

Published: 26 November 2024

Volume 55, article number 30, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xingyu Zhong ORCID: orcid.org/0000-0003-0602-2115¹,
Jinhui Wei²,
Siyuan Li² &
…
Qingzhen Xu^1,2

339 Accesses
1 Altmetric
Explore all metrics

Abstract

Financial markets present a complex and dynamic environment, making them an ideal testing ground for artificial intelligence (AI) and machine learning techniques. The integration of quantitative strategies with AI methods, particularly deep reinforcement learning (DRL), has shown promise in enhancing trading performance. Traditional quantitative strategies often rely on backtesting with historical data to validate their effectiveness. However, the inherent volatility and unpredictability of financial markets make it challenging for a single strategy to consistently outperform across different market conditions. In this paper, we introduce Financial Strategy Reinforcement Learning (FSRL), a novel framework leveraging DRL to dynamically select and execute the most appropriate quantitative strategy from a diverse set based on real-time market conditions. This approach departs from conventional methods that depend on a fixed strategy, instead modeling the strategy selection process as a Markov Decision Process (MDP). Within this framework, the DRL agent learns to adaptively switch between strategies, optimizing performance by responding to evolving market scenarios. Our experiments, conducted on two real-world market datasets, demonstrate that FSRL’s dynamic strategy-switching capability not only captures the strengths of individual strategies but also offers a robust and adaptive trading solution. While dynamic strategy selection may not always surpass the best-performing single strategy in every individual metric, it consistently outperforms the weakest strategy and provides a more resilient approach to managing the complexities of financial markets. These findings underscore the potential of DRL in transforming quantitative trading from a multi-factor approach to a multi-strategy paradigm, offering enhanced adaptability and robustness in the face of market volatility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Algorithm 1

Deep Reinforcement Learning for Automated Stock Trading: Inclusion of Short Selling

Revolutionizing SET50 Stock Portfolio Management with Deep Reinforcement Learning

A synchronous deep reinforcement learning model for automated multi-stock trading

Article 05 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

No datasets were generated or analysed during the current study.

Notes

References

Sun S, Wang R, An B (2023) Reinforcement learning for quantitative trading. ACM Trans Intell Syst Technol 14:1–29
Article MATH Google Scholar
Taylor SJ, Letham B (2018) Forecasting at scale. American Stat 72:37–45
Article MathSciNet MATH Google Scholar
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst 28:2222–2232
Article MathSciNet Google Scholar
Yang H, Liu X-Y, Zhong S, Walid A (2020) Deep reinforcement learning for automated stock trading: An ensemble strategy 1–8
Guan M, Liu X-Y (2021) Explainable deep reinforcement learning for portfolio management: an empirical approach 1–9
Niu H, Li S, Li J (2022) Metatrader: An reinforcement learning approach integrating diverse policies for portfolio optimization 1573–1583
Faber M (2007) A quantitative approach to tactical asset allocation. The Journal of Wealth Management, Spring
Osler CL (2000) Support for resistance: technical analysis and intraday exchange rates. Econ Policy Rev 6
Xu Y-H, Yang C-C, Hua M, Zhou W (2020) Deep deterministic policy gradient (ddpg)-based resource allocation scheme for noma vehicular communications. IEEE Access 8:18797–18807
Article Google Scholar
Tampuu A et al (2017) Multiagent cooperation and competition with deep reinforcement learning. PloS one 12:e0172395
Article Google Scholar
Shoham Y, Leyton-Brown K (2008) Multiagent systems: Algorithmic, game-theoretic, and logical foundations (Cambridge University Press)
Lowe R, et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30
Migdalas A (2002) Applications of game theory in finance and managerial accounting. Operat Res 2:209–241
Article MATH Google Scholar
Radhika T, Chandrasekar A, Vijayakumar V, Zhu Q (2023) Analysis of markovian jump stochastic cohen-grossberg bam neural networks with time delays for exponential input-to-state stability. Neural Process Lett 55:11055–11072
Article MATH Google Scholar
Rossi S, Tinn K (2021) Rational quantitative trading in efficient markets. J Econ Theory 191:105127
Article MathSciNet MATH Google Scholar
Shen S, Jiang H, Zhang T (2012) Stock market forecasting using machine learning algorithms. Department of Electrical Engineering, Stanford University, Stanford, CA 1–5
Krollner B, Vanstone BJ, Finnie GR, et al (2010) Financial time series forecasting with machine learning techniques: a survey
Dash R, Dash PK (2016) A hybrid stock trading framework integrating technical analysis with machine learning techniques. J Finance Data Sci 2:42–57
Article MATH Google Scholar
Tay FE, Cao L (2001) Application of support vector machines in financial time series forecasting. Omega 29:309–317
Tsantekidis A et al (2017) Forecasting stock prices from the limit order book using convolutional neural networks 1:7–12
Shen J, Shafiq MO (2020) Short-term stock market price trend prediction using a comprehensive deep learning system. J Big Data 7:1–33
Article MATH Google Scholar
Mnih V, et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533
Li L, Li D, Song T, Xu X (2020) Actor-critic learning control with regularization and feature selection in policy gradient estimation. IEEE Trans Neural Netw Learn Syst 32:1217–1227
Article MathSciNet MATH Google Scholar
Banerjee C, Chen Z, Noman N, Zamani M (2022) Optimal actor-critic policy with optimized training datasets. IEEE Trans Emerg Topics Computat Intell 6:1324–1334
Article Google Scholar
Cao Y, Chandrasekar A, Radhika T, Vijayakumar V (2024) Input-to-state stability of stochastic markovian jump genetic regulatory networks. Math Comput Simulation 222:174–187
Article MathSciNet MATH Google Scholar
Lee J, Kim R, Yi S-W, Kang J (2020) Maps: Multi-agent reinforcement learning-based portfolio management system. https://doi.org/10.24963/ijcai.2020/623
Huang Z, Tanaka F (2021) Mspm: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management
Howard RA (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA
MATH Google Scholar
Liu X-Y et al (2022) Finrl-meta: Market environments and benchmarks for data-driven financial reinforcement learning. Adv Neural Inf Process Syst 35:1835–1849
MATH Google Scholar
Brockman G, et al (2016) Openai gym. arXiv:1606.01540
Raffin A et al (2021) Stable-baselines3: Reliable reinforcement learning implementations. J Mach Learn Res 22:12348–12355
MATH Google Scholar
Liang E, et al (2018) Rllib: Abstractions for distributed reinforcement learning 3053–3062
Liu X-Y, Li Z, Wang Z, Zheng J (2021) Elegantrl: A lightweight and stable deep reinforcement learning library
Wu C, Bi W, Liu H (2023) Proximal policy optimization algorithm for dynamic pricing with online reviews. Expert Syst Appl 213:119191
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, South China Normal University, GuangZhou, 510631, China
Xingyu Zhong & Qingzhen Xu
School of Computer Science, South China Normal University, GuangZhou, 510631, China
Jinhui Wei, Siyuan Li & Qingzhen Xu

Authors

Xingyu Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Wei
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Qingzhen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors contributed equally to this work.

Corresponding author

Correspondence to Qingzhen Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Detailed explanation of financial metrics

In this section, we provide detailed explanations and formulas for the financial metrics used in our model: Sharpe Ratio (SR), Maximum Drawdown (MD), Total Return (TR), Annualized Return (AR), and Annualized Volatility (AV).

1.1 Sharpe ratio (SR)

The Sharpe Ratio measures the risk-adjusted return of a financial asset or portfolio. It is calculated by dividing the excess return (return above the risk-free rate) by the standard deviation of the asset’s return, which represents its risk.

The formula for the Sharpe Ratio is:

$$ SR = \frac{R_p - R_f}{\sigma _p} $$

where:

$R_p$ is the average return of the portfolio or asset,
$R_f$ is the risk-free rate (often based on government bonds),
$\sigma _p$ is the standard deviation of the portfolio’s or asset’s returns, representing risk.

A higher Sharpe Ratio indicates that the asset or portfolio has a better risk-adjusted performance.

1.2 Maximum drawdown (MD)

Maximum Drawdown is a measure of the largest peak-to-trough decline in the value of an asset or portfolio over a specified period. It represents the maximum observed loss from a historical high point before a new high is reached.

The formula for Maximum Drawdown is:

$$ MD = \frac{Trough\ Value - Peak\ Value}{Peak\ Value} $$

Maximum Drawdown is typically expressed as a percentage, and a lower Maximum Drawdown indicates better performance in managing risk during downturns.

1.3 Total return (TR)

Total Return measures the overall return of an investment, considering both price appreciation and dividends or interest payments. It represents the full return over a specified time period, including reinvestment of distributions.

The formula for Total Return is:

$$ TR = \frac{P_{end} - P_{start} + D}{P_{start}} $$

where:

$P_{end}$ is the ending price of the asset,
$P_{start}$ is the starting price of the asset,
D represents any dividends or distributions received over the period.

Total Return gives a complete picture of the profitability of an investment.

1.4 Annualized return (AR)

Annualized Return expresses the geometric average of the returns generated by an asset or portfolio over a specific time period, scaled to a one-year period. It allows for comparison of returns over different periods.

The formula for Annualized Return is:

$$ AR = \left( \frac{P_{end}}{P_{start}} \right) ^{\frac{1}{n}} - 1 $$

where:

$P_{end}$ is the ending value of the asset or portfolio,
$P_{start}$ is the starting value of the asset or portfolio,
n is the number of years in the time period.

Annualized Return helps compare the performance of assets over different time periods by standardizing the return to an annualized basis.

1.5 Annualized volatility (AV)

Annualized Volatility is a measure of the dispersion of returns for an asset or portfolio over a given period, expressed on an annual basis. It represents the risk or uncertainty associated with the asset’s return.

The formula for Annualized Volatility is:

$$ AV = \sigma \sqrt{n} $$

where:

$\sigma $ is the standard deviation of the asset’s returns,
n is the number of periods in a year (e.g., for daily returns, n would be 252, the number of trading days in a year).

Higher Annualized Volatility indicates higher risk, as it shows greater variability in returns over time.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhong, X., Wei, J., Li, S. et al. Deep reinforcement learning for dynamic strategy interchange in financial markets. Appl Intell 55, 30 (2025). https://doi.org/10.1007/s10489-024-05965-2

Download citation

Accepted: 06 October 2024
Published: 26 November 2024
DOI: https://doi.org/10.1007/s10489-024-05965-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep reinforcement learning for dynamic strategy interchange in financial markets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement Learning for Automated Stock Trading: Inclusion of Short Selling

Revolutionizing SET50 Stock Portfolio Management with Deep Reinforcement Learning

A synchronous deep reinforcement learning model for automated multi-stock trading

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix A: Detailed explanation of financial metrics

1.1 Sharpe ratio (SR)

1.2 Maximum drawdown (MD)

1.3 Total return (TR)

1.4 Annualized return (AR)

1.5 Annualized volatility (AV)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep reinforcement learning for dynamic strategy interchange in financial markets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Reinforcement Learning for Automated Stock Trading: Inclusion of Short Selling

Revolutionizing SET50 Stock Portfolio Management with Deep Reinforcement Learning

A synchronous deep reinforcement learning model for automated multi-stock trading

Explore related subjects

Data Availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix A: Detailed explanation of financial metrics

Appendix A: Detailed explanation of financial metrics

1.1 Sharpe ratio (SR)

1.2 Maximum drawdown (MD)

1.3 Total return (TR)

1.4 Annualized return (AR)

1.5 Annualized volatility (AV)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation