Skip to main content

Advertisement

Deep reinforcement learning for dynamic strategy interchange in financial markets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Financial markets present a complex and dynamic environment, making them an ideal testing ground for artificial intelligence (AI) and machine learning techniques. The integration of quantitative strategies with AI methods, particularly deep reinforcement learning (DRL), has shown promise in enhancing trading performance. Traditional quantitative strategies often rely on backtesting with historical data to validate their effectiveness. However, the inherent volatility and unpredictability of financial markets make it challenging for a single strategy to consistently outperform across different market conditions. In this paper, we introduce Financial Strategy Reinforcement Learning (FSRL), a novel framework leveraging DRL to dynamically select and execute the most appropriate quantitative strategy from a diverse set based on real-time market conditions. This approach departs from conventional methods that depend on a fixed strategy, instead modeling the strategy selection process as a Markov Decision Process (MDP). Within this framework, the DRL agent learns to adaptively switch between strategies, optimizing performance by responding to evolving market scenarios. Our experiments, conducted on two real-world market datasets, demonstrate that FSRL’s dynamic strategy-switching capability not only captures the strengths of individual strategies but also offers a robust and adaptive trading solution. While dynamic strategy selection may not always surpass the best-performing single strategy in every individual metric, it consistently outperforms the weakest strategy and provides a more resilient approach to managing the complexities of financial markets. These findings underscore the potential of DRL in transforming quantitative trading from a multi-factor approach to a multi-strategy paradigm, offering enhanced adaptability and robustness in the face of market volatility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

No datasets were generated or analysed during the current study.

Notes

  1. https://github.com/XingYu-Zhong/FSRL

  2. https://github.com/XingYu-Zhong/FSRL-EXPERIMENTS

References

  1. Sun S, Wang R, An B (2023) Reinforcement learning for quantitative trading. ACM Trans Intell Syst Technol 14:1–29

    Article  MATH  Google Scholar 

  2. Taylor SJ, Letham B (2018) Forecasting at scale. American Stat 72:37–45

    Article  MathSciNet  MATH  Google Scholar 

  3. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst 28:2222–2232

    Article  MathSciNet  Google Scholar 

  4. Yang H, Liu X-Y, Zhong S, Walid A (2020) Deep reinforcement learning for automated stock trading: An ensemble strategy 1–8

  5. Guan M, Liu X-Y (2021) Explainable deep reinforcement learning for portfolio management: an empirical approach 1–9

  6. Niu H, Li S, Li J (2022) Metatrader: An reinforcement learning approach integrating diverse policies for portfolio optimization 1573–1583

  7. Faber M (2007) A quantitative approach to tactical asset allocation. The Journal of Wealth Management, Spring

  8. Osler CL (2000) Support for resistance: technical analysis and intraday exchange rates. Econ Policy Rev 6

  9. Xu Y-H, Yang C-C, Hua M, Zhou W (2020) Deep deterministic policy gradient (ddpg)-based resource allocation scheme for noma vehicular communications. IEEE Access 8:18797–18807

    Article  Google Scholar 

  10. Tampuu A et al (2017) Multiagent cooperation and competition with deep reinforcement learning. PloS one 12:e0172395

    Article  Google Scholar 

  11. Shoham Y, Leyton-Brown K (2008) Multiagent systems: Algorithmic, game-theoretic, and logical foundations (Cambridge University Press)

  12. Lowe R, et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30

  13. Migdalas A (2002) Applications of game theory in finance and managerial accounting. Operat Res 2:209–241

    Article  MATH  Google Scholar 

  14. Radhika T, Chandrasekar A, Vijayakumar V, Zhu Q (2023) Analysis of markovian jump stochastic cohen-grossberg bam neural networks with time delays for exponential input-to-state stability. Neural Process Lett 55:11055–11072

    Article  MATH  Google Scholar 

  15. Rossi S, Tinn K (2021) Rational quantitative trading in efficient markets. J Econ Theory 191:105127

    Article  MathSciNet  MATH  Google Scholar 

  16. Shen S, Jiang H, Zhang T (2012) Stock market forecasting using machine learning algorithms. Department of Electrical Engineering, Stanford University, Stanford, CA 1–5

  17. Krollner B, Vanstone BJ, Finnie GR, et al (2010) Financial time series forecasting with machine learning techniques: a survey

  18. Dash R, Dash PK (2016) A hybrid stock trading framework integrating technical analysis with machine learning techniques. J Finance Data Sci 2:42–57

    Article  MATH  Google Scholar 

  19. Tay FE, Cao L (2001) Application of support vector machines in financial time series forecasting. Omega 29:309–317

  20. Tsantekidis A et al (2017) Forecasting stock prices from the limit order book using convolutional neural networks 1:7–12

  21. Shen J, Shafiq MO (2020) Short-term stock market price trend prediction using a comprehensive deep learning system. J Big Data 7:1–33

    Article  MATH  Google Scholar 

  22. Mnih V, et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533

  23. Li L, Li D, Song T, Xu X (2020) Actor-critic learning control with regularization and feature selection in policy gradient estimation. IEEE Trans Neural Netw Learn Syst 32:1217–1227

    Article  MathSciNet  MATH  Google Scholar 

  24. Banerjee C, Chen Z, Noman N, Zamani M (2022) Optimal actor-critic policy with optimized training datasets. IEEE Trans Emerg Topics Computat Intell 6:1324–1334

    Article  Google Scholar 

  25. Cao Y, Chandrasekar A, Radhika T, Vijayakumar V (2024) Input-to-state stability of stochastic markovian jump genetic regulatory networks. Math Comput Simulation 222:174–187

    Article  MathSciNet  MATH  Google Scholar 

  26. Lee J, Kim R, Yi S-W, Kang J (2020) Maps: Multi-agent reinforcement learning-based portfolio management system. https://doi.org/10.24963/ijcai.2020/623

  27. Huang Z, Tanaka F (2021) Mspm: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management

  28. Howard RA (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  29. Liu X-Y et al (2022) Finrl-meta: Market environments and benchmarks for data-driven financial reinforcement learning. Adv Neural Inf Process Syst 35:1835–1849

    MATH  Google Scholar 

  30. Brockman G, et al (2016) Openai gym. arXiv:1606.01540

  31. Raffin A et al (2021) Stable-baselines3: Reliable reinforcement learning implementations. J Mach Learn Res 22:12348–12355

    MATH  Google Scholar 

  32. Liang E, et al (2018) Rllib: Abstractions for distributed reinforcement learning 3053–3062

  33. Liu X-Y, Li Z, Wang Z, Zheng J (2021) Elegantrl: A lightweight and stable deep reinforcement learning library

  34. Wu C, Bi W, Liu H (2023) Proximal policy optimization algorithm for dynamic pricing with online reviews. Expert Syst Appl 213:119191

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All the authors contributed equally to this work.

Corresponding author

Correspondence to Qingzhen Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Detailed explanation of financial metrics

Appendix A: Detailed explanation of financial metrics

In this section, we provide detailed explanations and formulas for the financial metrics used in our model: Sharpe Ratio (SR), Maximum Drawdown (MD), Total Return (TR), Annualized Return (AR), and Annualized Volatility (AV).

1.1 Sharpe ratio (SR)

The Sharpe Ratio measures the risk-adjusted return of a financial asset or portfolio. It is calculated by dividing the excess return (return above the risk-free rate) by the standard deviation of the asset’s return, which represents its risk.

The formula for the Sharpe Ratio is:

$$ SR = \frac{R_p - R_f}{\sigma _p} $$

where:

  • \(R_p\) is the average return of the portfolio or asset,

  • \(R_f\) is the risk-free rate (often based on government bonds),

  • \(\sigma _p\) is the standard deviation of the portfolio’s or asset’s returns, representing risk.

A higher Sharpe Ratio indicates that the asset or portfolio has a better risk-adjusted performance.

1.2 Maximum drawdown (MD)

Maximum Drawdown is a measure of the largest peak-to-trough decline in the value of an asset or portfolio over a specified period. It represents the maximum observed loss from a historical high point before a new high is reached.

The formula for Maximum Drawdown is:

$$ MD = \frac{Trough\ Value - Peak\ Value}{Peak\ Value} $$

Maximum Drawdown is typically expressed as a percentage, and a lower Maximum Drawdown indicates better performance in managing risk during downturns.

1.3 Total return (TR)

Total Return measures the overall return of an investment, considering both price appreciation and dividends or interest payments. It represents the full return over a specified time period, including reinvestment of distributions.

The formula for Total Return is:

$$ TR = \frac{P_{end} - P_{start} + D}{P_{start}} $$

where:

  • \(P_{end}\) is the ending price of the asset,

  • \(P_{start}\) is the starting price of the asset,

  • D represents any dividends or distributions received over the period.

Total Return gives a complete picture of the profitability of an investment.

1.4 Annualized return (AR)

Annualized Return expresses the geometric average of the returns generated by an asset or portfolio over a specific time period, scaled to a one-year period. It allows for comparison of returns over different periods.

The formula for Annualized Return is:

$$ AR = \left( \frac{P_{end}}{P_{start}} \right) ^{\frac{1}{n}} - 1 $$

where:

  • \(P_{end}\) is the ending value of the asset or portfolio,

  • \(P_{start}\) is the starting value of the asset or portfolio,

  • n is the number of years in the time period.

Annualized Return helps compare the performance of assets over different time periods by standardizing the return to an annualized basis.

1.5 Annualized volatility (AV)

Annualized Volatility is a measure of the dispersion of returns for an asset or portfolio over a given period, expressed on an annual basis. It represents the risk or uncertainty associated with the asset’s return.

The formula for Annualized Volatility is:

$$ AV = \sigma \sqrt{n} $$

where:

  • \(\sigma \) is the standard deviation of the asset’s returns,

  • n is the number of periods in a year (e.g., for daily returns, n would be 252, the number of trading days in a year).

Higher Annualized Volatility indicates higher risk, as it shows greater variability in returns over time.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, X., Wei, J., Li, S. et al. Deep reinforcement learning for dynamic strategy interchange in financial markets. Appl Intell 55, 30 (2025). https://doi.org/10.1007/s10489-024-05965-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05965-2

Keywords