Abstract
Financial markets present a complex and dynamic environment, making them an ideal testing ground for artificial intelligence (AI) and machine learning techniques. The integration of quantitative strategies with AI methods, particularly deep reinforcement learning (DRL), has shown promise in enhancing trading performance. Traditional quantitative strategies often rely on backtesting with historical data to validate their effectiveness. However, the inherent volatility and unpredictability of financial markets make it challenging for a single strategy to consistently outperform across different market conditions. In this paper, we introduce Financial Strategy Reinforcement Learning (FSRL), a novel framework leveraging DRL to dynamically select and execute the most appropriate quantitative strategy from a diverse set based on real-time market conditions. This approach departs from conventional methods that depend on a fixed strategy, instead modeling the strategy selection process as a Markov Decision Process (MDP). Within this framework, the DRL agent learns to adaptively switch between strategies, optimizing performance by responding to evolving market scenarios. Our experiments, conducted on two real-world market datasets, demonstrate that FSRL’s dynamic strategy-switching capability not only captures the strengths of individual strategies but also offers a robust and adaptive trading solution. While dynamic strategy selection may not always surpass the best-performing single strategy in every individual metric, it consistently outperforms the weakest strategy and provides a more resilient approach to managing the complexities of financial markets. These findings underscore the potential of DRL in transforming quantitative trading from a multi-factor approach to a multi-strategy paradigm, offering enhanced adaptability and robustness in the face of market volatility.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
No datasets were generated or analysed during the current study.
References
Sun S, Wang R, An B (2023) Reinforcement learning for quantitative trading. ACM Trans Intell Syst Technol 14:1–29
Taylor SJ, Letham B (2018) Forecasting at scale. American Stat 72:37–45
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) Lstm: A search space odyssey. IEEE Trans Neural Netw Learn Syst 28:2222–2232
Yang H, Liu X-Y, Zhong S, Walid A (2020) Deep reinforcement learning for automated stock trading: An ensemble strategy 1–8
Guan M, Liu X-Y (2021) Explainable deep reinforcement learning for portfolio management: an empirical approach 1–9
Niu H, Li S, Li J (2022) Metatrader: An reinforcement learning approach integrating diverse policies for portfolio optimization 1573–1583
Faber M (2007) A quantitative approach to tactical asset allocation. The Journal of Wealth Management, Spring
Osler CL (2000) Support for resistance: technical analysis and intraday exchange rates. Econ Policy Rev 6
Xu Y-H, Yang C-C, Hua M, Zhou W (2020) Deep deterministic policy gradient (ddpg)-based resource allocation scheme for noma vehicular communications. IEEE Access 8:18797–18807
Tampuu A et al (2017) Multiagent cooperation and competition with deep reinforcement learning. PloS one 12:e0172395
Shoham Y, Leyton-Brown K (2008) Multiagent systems: Algorithmic, game-theoretic, and logical foundations (Cambridge University Press)
Lowe R, et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 30
Migdalas A (2002) Applications of game theory in finance and managerial accounting. Operat Res 2:209–241
Radhika T, Chandrasekar A, Vijayakumar V, Zhu Q (2023) Analysis of markovian jump stochastic cohen-grossberg bam neural networks with time delays for exponential input-to-state stability. Neural Process Lett 55:11055–11072
Rossi S, Tinn K (2021) Rational quantitative trading in efficient markets. J Econ Theory 191:105127
Shen S, Jiang H, Zhang T (2012) Stock market forecasting using machine learning algorithms. Department of Electrical Engineering, Stanford University, Stanford, CA 1–5
Krollner B, Vanstone BJ, Finnie GR, et al (2010) Financial time series forecasting with machine learning techniques: a survey
Dash R, Dash PK (2016) A hybrid stock trading framework integrating technical analysis with machine learning techniques. J Finance Data Sci 2:42–57
Tay FE, Cao L (2001) Application of support vector machines in financial time series forecasting. Omega 29:309–317
Tsantekidis A et al (2017) Forecasting stock prices from the limit order book using convolutional neural networks 1:7–12
Shen J, Shafiq MO (2020) Short-term stock market price trend prediction using a comprehensive deep learning system. J Big Data 7:1–33
Mnih V, et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533
Li L, Li D, Song T, Xu X (2020) Actor-critic learning control with regularization and feature selection in policy gradient estimation. IEEE Trans Neural Netw Learn Syst 32:1217–1227
Banerjee C, Chen Z, Noman N, Zamani M (2022) Optimal actor-critic policy with optimized training datasets. IEEE Trans Emerg Topics Computat Intell 6:1324–1334
Cao Y, Chandrasekar A, Radhika T, Vijayakumar V (2024) Input-to-state stability of stochastic markovian jump genetic regulatory networks. Math Comput Simulation 222:174–187
Lee J, Kim R, Yi S-W, Kang J (2020) Maps: Multi-agent reinforcement learning-based portfolio management system. https://doi.org/10.24963/ijcai.2020/623
Huang Z, Tanaka F (2021) Mspm: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management
Howard RA (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA
Liu X-Y et al (2022) Finrl-meta: Market environments and benchmarks for data-driven financial reinforcement learning. Adv Neural Inf Process Syst 35:1835–1849
Brockman G, et al (2016) Openai gym. arXiv:1606.01540
Raffin A et al (2021) Stable-baselines3: Reliable reinforcement learning implementations. J Mach Learn Res 22:12348–12355
Liang E, et al (2018) Rllib: Abstractions for distributed reinforcement learning 3053–3062
Liu X-Y, Li Z, Wang Z, Zheng J (2021) Elegantrl: A lightweight and stable deep reinforcement learning library
Wu C, Bi W, Liu H (2023) Proximal policy optimization algorithm for dynamic pricing with online reviews. Expert Syst Appl 213:119191
Author information
Authors and Affiliations
Contributions
All the authors contributed equally to this work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Detailed explanation of financial metrics
Appendix A: Detailed explanation of financial metrics
In this section, we provide detailed explanations and formulas for the financial metrics used in our model: Sharpe Ratio (SR), Maximum Drawdown (MD), Total Return (TR), Annualized Return (AR), and Annualized Volatility (AV).
1.1 Sharpe ratio (SR)
The Sharpe Ratio measures the risk-adjusted return of a financial asset or portfolio. It is calculated by dividing the excess return (return above the risk-free rate) by the standard deviation of the asset’s return, which represents its risk.
The formula for the Sharpe Ratio is:
where:
-
\(R_p\) is the average return of the portfolio or asset,
-
\(R_f\) is the risk-free rate (often based on government bonds),
-
\(\sigma _p\) is the standard deviation of the portfolio’s or asset’s returns, representing risk.
A higher Sharpe Ratio indicates that the asset or portfolio has a better risk-adjusted performance.
1.2 Maximum drawdown (MD)
Maximum Drawdown is a measure of the largest peak-to-trough decline in the value of an asset or portfolio over a specified period. It represents the maximum observed loss from a historical high point before a new high is reached.
The formula for Maximum Drawdown is:
Maximum Drawdown is typically expressed as a percentage, and a lower Maximum Drawdown indicates better performance in managing risk during downturns.
1.3 Total return (TR)
Total Return measures the overall return of an investment, considering both price appreciation and dividends or interest payments. It represents the full return over a specified time period, including reinvestment of distributions.
The formula for Total Return is:
where:
-
\(P_{end}\) is the ending price of the asset,
-
\(P_{start}\) is the starting price of the asset,
-
D represents any dividends or distributions received over the period.
Total Return gives a complete picture of the profitability of an investment.
1.4 Annualized return (AR)
Annualized Return expresses the geometric average of the returns generated by an asset or portfolio over a specific time period, scaled to a one-year period. It allows for comparison of returns over different periods.
The formula for Annualized Return is:
where:
-
\(P_{end}\) is the ending value of the asset or portfolio,
-
\(P_{start}\) is the starting value of the asset or portfolio,
-
n is the number of years in the time period.
Annualized Return helps compare the performance of assets over different time periods by standardizing the return to an annualized basis.
1.5 Annualized volatility (AV)
Annualized Volatility is a measure of the dispersion of returns for an asset or portfolio over a given period, expressed on an annual basis. It represents the risk or uncertainty associated with the asset’s return.
The formula for Annualized Volatility is:
where:
-
\(\sigma \) is the standard deviation of the asset’s returns,
-
n is the number of periods in a year (e.g., for daily returns, n would be 252, the number of trading days in a year).
Higher Annualized Volatility indicates higher risk, as it shows greater variability in returns over time.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhong, X., Wei, J., Li, S. et al. Deep reinforcement learning for dynamic strategy interchange in financial markets. Appl Intell 55, 30 (2025). https://doi.org/10.1007/s10489-024-05965-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05965-2