An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown

doi:10.1016/j.eswa.2017.06.023

Expert Systems with Applications

Volume 87, 30 November 2017, Pages 267-279

https://doi.org/10.1016/j.eswa.2017.06.023 Get rights and content

Highlights

•
A reinforcement learning trading algorithm with expected drawdown risk is proposed.
•
The expected maximum drawdown is shown to improve portfolio signal generation.
•
The effectiveness of the method is validated using different transaction costs.
•
An adaptive portfolio rebalancing system with automated retraining is recommended.

Abstract

Dynamic control theory has long been used in solving optimal asset allocation problems, and a number of trading decision systems based on reinforcement learning methods have been applied in asset allocation and portfolio rebalancing. In this paper, we extend the existing work in recurrent reinforcement learning (RRL) and build an optimal variable weight portfolio allocation under a coherent downside risk measure, the expected maximum drawdown, E(MDD). In particular, we propose a recurrent reinforcement learning method, with a coherent risk adjusted performance objective function, the Calmar ratio, to obtain both buy and sell signals and asset allocation weights. Using a portfolio consisting of the most frequently traded exchange-traded funds, we show that the expected maximum drawdown risk based objective function yields superior return performance compared to previously proposed RRL objective functions (i.e. the Sharpe ratio and the Sterling ratio), and that variable weight RRL long/short portfolios outperform equal weight RRL long/short portfolios under different transaction cost scenarios. We further propose an adaptive E(MDD) risk based RRL portfolio rebalancing decision system with a transaction cost and market condition stop-loss retraining mechanism, and we show that the proposed portfolio trading system responds to transaction cost effects better and outperforms hedge fund benchmarks consistently.

Introduction

In financial investing, a general goal is to dynamically allocate a set of assets to maximize the returns over time and minimize risk simultaneously. For investors it is essential to be able to invest in a portfolio that can satisfy their preset goals by building an optimal portfolio initially and subsequently rebalancing it optimally. Portfolio theory began with mean-variance optimization by Markowitz (1952) where he proposed portfolio selection by maximizing the expected return while minimizing risk in the form of covariance matrices. Rebalancing a portfolio re-optimizes the weights of the portfolio over a predefined time horizon. The application of dynamic asset allocation using dynamic programming methods was originally introduced by Bertsekas (1995). Due to the curse of dimensionality in dynamic programming, automated self learning algorithms are normally applied by investors and scholars in designing optimal trading strategies instead. The reinforcement learning method is a type of approximate dynamic programming and a subcategory of machine learning introduced by Sutton and Barto (1998), and has been broadly applied by investors and researchers in building strategic asset allocation decision systems (Dempster, Leemans, 2006, Feuerriegel, Prendinger, 2016, Gold, 2003a, Tan, Quek, Cheng, 2011).

In this paper, we apply the recurrent reinforcement learning (RRL) method with a statistically coherent downside risk adjusted performance objective function to simultaneously generate both buy/sell signals and optimal asset allocation weights. Moody, Wu, Liao, and Saffell (1998) introduced recurrent reinforcement learning in building a trading system where they examined the performance effect between using the Sharp ratio vs. several economic utility functions. They concluded that the Sharpe ratio behaves like an adaptive utility function, and when maximizing the differential Sharpe ratio as immediate rewards in an online learning mode, the Sharpe ratio significantly outperforms the ones maximizing profits directly. Most of the subsequent work (Gold, 2003b, Maringer, Ramtohul, 2010, Maringer, Ramtohul, 2012, Pu, Ananthanarayanan, Bodik, Kandula, Akella, Bahl, Stoica, Mengistu, Lehman, Clune, Dulac-Arnold, Evans, Sunehag, Coppin, Shakirov, Zhang, Maringer, Graves, Schmidhuber, 2016) focused on equally weighted portfolios. Although both Moody et al. (1998) and Bertoluzzo and Corazza (2008) mentioned potential drawdown effects on RRL performance, neither thoroughly examined the actual effects.

Many practitioners tend to adjust the commonly accepted theoretical models to apply them to their particular situations, or develop measures that focus on their specific interests. They often neglect the theoretical aspects or assumptions of their adjustments, such as in the safety-first risk measures (i.e. the Sharpe ratio, the Sortino ratio, the Sterling ratio, and the Calmar ratio). Bhansali (2007) and Zimmermann, Drobetz, and Oertmann (2003) noted that many risk measures based on estimation of covariance matrices using historical data failed notoriously when they are needed the most. They agreed that the difference in volatility and correlations between up and down market environments implies the risk reduction potential is limited leaving them incapable of foreseeing stress-type events. We argue that large drawdowns usually lead to fund redemption, and hence they should lead to very different optimal decisions. In this paper, we extend the variable weight RRL long only approach by Moody and Saffell (2001) to a long-short approach and examine the expected maximum drawdown E(MDD) (Magdon-Ismail & Atiya, 2004) effect on portfolio performance with joint interaction of transaction costs. Magdon-Ismail, Atiya, Pratap, and Abu-Mostafa (2003) and Magdon-Ismail and Atiya (2004) provided a statistically coherent downside risk measure, the Calmar ratio with the expected maximum drawdown, which provides a theoretical base for us to apply this downside risk measure as a differentiable objective function in RRL. This E(MDD) based Calmar ratio (Magdon-Ismail et al., 2003) is distinctly different from the exponential moving average drawdown approach used by Moody and Saffell (2001).

More specifically, we compare the Calmar ratio¹ with the Sharpe ratio where the risk adjusted measure of performance is calculated by the standard deviation of the returns over a predefined time horizon. Furthermore, we use the recurrent reinforcement learning method with two different objective functions through which we incorporate different risk considerations. We show that the recurrent reinforcement learning with variable weight asset allocation gives a superior performance when applied to a set of highly liquid exchange-traded funds (ETF) with various transaction cost considerations over a 5 year period. We also document that when the expected maximum drawdowns are considered, the RRL can generate a superior portfolio to the ones generated by the average deviation performance measure - the Sharpe ratio. This confirms the intuition that a reasonably low MDD is critical to the success of any fund.

In addition, we propose a portfolio allocation and rebalancing system using RRL with E(MDD) as the performance measure, and this trading system jointly considers transaction costs and market conditions to automatically retrain the system parameters to achieve better performances. We show that a trading system with the stop-loss based on market volatility regime is able to make the portfolio endure higher transaction costs in that the stop-loss strategy will exit the market when the volatility is high and retrain the parameters of the signal generating process and generate new signals to reenter the market. Such a trading decision system is adaptive to the market conditions and is more resilient to transaction cost shocks.

The rest of the paper is organized as follows. In Section 2, we review existing work on dynamic portfolio optimization using reinforcement learning methods. We introduce the expected maximum drawdown and its application to RRL in Section 3. We apply the RRL based portfolio rebalancing approach to a set of ETFs to compare the cost effect of the Sharpe ratio vs. the Calmar ratio using RRL in Section 4. Section 5 conducts a final analysis comparing the performance of the proposed risk-return portfolio optimization with that of two hedge fund indices, and Section 6 concludes the study and identifies some future work.

Section snippets

Literature review

Machine learning algorithms are widely used for financial market prediction and portfolio constructions, especially for automated trading strategies. Sutton, Barto, and Williams (1992) first introduced the reinforcement learning method (Q-learning) and provided its analytically proven capabilities for one class of adaptive optimal control problems. Recurrent reinforcement learning was introduced by Moody et al. (1998) where it was applied to stock trading as a learning algorithm and they

Methodology

In this paper, we use the recurrent reinforcement method in portfolio optimization with different risk considerations through two objective functions. Following Moody et al. (1998), we use the differential Sharpe ratio for dynamic optimization of trading system performance. We use performance functions both to increase the convergence of the learning process and to adapt to changing market conditions during live trading (see Fig. 1). During this process, the parameter updates can be done during

Trading algorithms comparison

In this section, we first compare the performance of three performance ratios as three different objective functions for the model. This will result in three trading algorithms producing different trading decisions for the same set of assets, and then we can readily assess the merits of each performance ratio in generating trading signals. The resulting portfolio rebalancing methods are: the Sharpe ratio RRL (SR-RRL), the Sterling ratio RRL (TR-RRL), and the Calmar ratio RRL (CR-RRL).

We show

Trading system and discussion

We develop an adaptive trading system based on the recurrent reinforcement learning using three different objective functions. The recurrent reinforcement learning system is a recursive learning system, where the system learns from every output every time step. In this system, the trader can select an objective function that would be the best for the assets of his portfolio. The system parameters are trained based on the objective function desired. We have introduced three objective functions

Conclusion

In this paper, we use the recurrent reinforcement learning method to solve a dynamic portfolio optimization problem where we develop four portfolios using the RRL and compare them with each other and the buy & hold portfolio. We use RRL methods to optimize the portfolio weights and rebalance the portfolio over a predefined time horizon. We compare the deferential of the Sharpe ratio and the Calmar ratio as the objective functions in the recurrent reinforcement learning process and examine the

References (40)

P. Beraldi et al.
A decision support system for strategic asset allocation
Decision Support Systems
(2011)
R.C. Cavalcante et al.
Computational intelligence and financial markets: A survey and future directions
Expert Systems with Applications
(2016)
T.-J. Chang et al.
Heuristics for cardinality constrained portfolio optimisation
Computers & Operations Research
(2000)
M.A. Dempster et al.
An automated fx trading system using adaptive reinforcement learning
Expert Systems with Applications
(2006)
D. Eilers et al.
Intelligent trading of seasonal effects: A decision support algorithm based on reinforcement learning
Decision Support Systems
(2014)
S. Feuerriegel et al.
News-based trading strategies
Decision Support Systems
(2016)
K. Lwin et al.
A learning-guided multi-objective evolutionary algorithm for constrained portfolio optimization
Applied Soft Computing
(2014)
A. Silva et al.
A hybrid approach to portfolio composition based on fundamental and technical indicators
Expert Systems with Applications
(2015)
Z. Tan et al.
Stock trading with cycles: A financial application of anfis and reinforcement learning
Expert Systems with Applications
(2011)
F. Bertoluzzo et al.
Financial trading systems: Is recurrent reinforcement learning the way?
Reflexing interfaces: The complex coevolution of information technology ecosystems
(2008)

D.P. Bertsekas

Dynamic programming and optimal control

(1995)

V. Bhansali

Putting economics (back) into quantitative models

The Journal of Portfolio Management

(2007)

L.K.C. Chan et al.

Institutional equity trading costs: NYSE versus nasdaq

The Journal of Finance

(1997)

T.S. Chande

Beyond technical analysis: How to develop and implement a winning trading system

(2001)

B. Chen et al.

The mean-variance cardinality constrained portfolio optimization problem using a local search-based multi-objective evolutionary algorithm

Applied Intelligence

(2017)

J. Chevallier et al.

Implementing a simple rule for dynamic stop-loss strategies

The Journal of Investing

(2012)

V. DeMiguel et al.

Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy?

Review of Financial Studies

(2009)

C. Gold

Fx trading via recurrent reinforcement learning

Computational intelligence for financial engineering, 2003. proceedings. 2003 IEEE international conference on

(2003)

C. Gold

FX trading via recurrent reinforcement learning

IEEE/IAFE conference on computational intelligence for financial engineering, proceedings (cifer)

(2003)

D. Gorse

Application of stochastic recurrent reinforcement learning to index trading

ESANN 2011 proceedings, 19th European symposium on artificial neural networks, computational intelligence and machine learning

(2010)

Cited by (173)

Outperforming the tutor: Expert-infused deep reinforcement learning for dynamic portfolio selection of diverse assets
2024, Knowledge-Based Systems
Machine learning has become a key tool in solving the complex problem of automated dynamic portfolio selection. However, most existing works rely on data from a limited number of asset classes and cover constrained investment time frames to train their models. Additionally, these models are often trained from randomly initialised weights and lack intuitive explainability regarding the outputs they generate. This research introduces a novel methodology aimed at maximising long-term risk-adjusted investment returns while preserving the explainability of model output. We adopt established rule-based strategies to generate neural network mimics through imitation learning, transferring expert knowledge. These “tutor” models are then enhanced using a hybrid reinforcement learning algorithm combining Soft Actor-Critic (SAC) and Deep Deterministic Policy Gradient (DDPG), with the goal of creating “student” models that outperform their tutors. The intuitive explainability of the model is robust as it is trained on the rule-based algorithmic tutor model, unlike other models trained from scratch with randomly initialised weights. We empirically validate the strength of our strategy employing a broad range of asset classes, such as stocks, bonds, US treasuries, commodities, and their leveraged equivalents, simulated on a nearly 40-year price dataset. The benefits of the new model are clear, with the results showing a performance improvement of up to 39.70% in the test set by the Sharpe ratio and up to 47.07% by the Sortino ratio. This work highlights the potential of integrating established strategies with advanced machine learning for a data-driven approach to asset management.
New reinforcement learning based on representation transfer for portfolio management
2024, Knowledge-Based Systems
Portfolio management is an important financial task, which helps to activate the capital market and boost investor confidence by finding a long-term profitable policy. Reinforcement learning is one of the most prospective method for this task. However, the policy learnt by reinforcement learning lacks robustness, because current studies cannot overcome the distribution shift between the dynamics of the current environment and the future environment. To better tackle this issue, we proposed a method that can align the distributions of different environment dynamics in a pre-trained representation space, thereby enhancing the robustness of the optimal policy in future environments. The key insight of this method is to only extract shared representation between the current and the future, which is the high-level latent information that spans across sequences. This information exists everywhere in the historical sequence, so it can be assumed that it will not disappear in the near future, thus align the distribution of different environment dynamics in this representation space. Such representations are learnt by encoding the current and the future into representations and maximizing their mutual information using probabilistic contrastive loss. Experiments demonstrates the superior performance and universality of our method.
Artificial intelligence techniques in financial trading: A systematic literature review
2024, Journal of King Saud University - Computer and Information Sciences
Artificial Intelligence (AI) approaches have been increasingly used in financial markets as technology advances. In this research paper, we conduct a Systematic Literature Review (SLR) that studies financial trading approaches through AI techniques. It reviews 143 research articles that implemented AI techniques in financial trading markets. Accordingly, it presents several findings and observations after reviewing the papers from the following perspectives: the financial trading market and the asset type, the trading analysis type considered along with the AI technique, and the AI techniques utilized in the trading market, the estimation and performance metrics of the proposed models. The selected research articles were published between 2015 and 2023, and this review addresses four RQs. After analyzing the selected research articles, we observed 8 financial markets used in building predictive models. Moreover, we found that technical analysis is more adopted compared to fundamental analysis. Furthermore, 16% of the selected research articles entirely automate the trading process. In addition, we identified 40 different AI techniques that are used as standalone and hybrid models. Among these techniques, deep learning techniques are the most frequently used in financial trading markets. Building prediction models for financial markets using AI is a promising field of research, and academics have already deployed several machine learning models. As a result of this evaluation, we provide recommendations and guidance to researchers.
Exploring optimal pathways for enterprise procurement management systems based on fast neural modeling and semantic segmentation
2024, Heliyon
Corporate procurement management assumes a pivotal role within the contemporary business landscape, yet confronts an array of challenges as markets continue to evolve and globalize. Conventional procurement management systems frequently grapple with issues of inefficiency, resource depletion, and noncompliance, necessitating the exploration of innovative avenues for optimization. This paper delves into the realm of risk mitigation associated with collusion behavior in the administration of intelligent procurement systems, presenting a novel procurement collusion identification model founded on a convolutional neural network (CNN) with reinforcement learning techniques. This framework commences with the application of a CNN and Long Short-Term Memory (LSTM) network for in-depth feature analysis and initial identification of historical procurement data, subsequently leveraging reinforcement learning methodologies to enhance the model's autonomy and intelligence for the purpose of optimization. Throughout the experimental phase, diverse domains of procurement data were meticulously selected for analysis. The empirical findings unequivocally demonstrate the model's proficiency, with an average recognition accuracy of 95.1% across five publicly available datasets. This performance surpasses existing machine learning methodologies employed in contemporary research and common recognition networks, thereby offering a pioneering reference point for the intelligent administration and optimization of future procurement systems.
Soft imitation reinforcement learning with value decomposition for portfolio management
2024, Applied Soft Computing
Imitation learning has been recognized as a method to accelerate the training process of deep reinforcement learning agents in search of optimal strategies. Nevertheless, existing imitation learning algorithms have limitations in effectively leveraging expert demonstrations when confronted with dynamic environments, as the behavior cloning loss weight cannot be adaptively updated. To overcome this challenge, we propose a novel approach called Soft Imitation Reinforcement Learning (SIRL), which combines imitation learning and reinforcement learning to guide the training of reinforcement learning agents in an adaptive manner. Additionally, we addressed the challenge of high-dimensional action spaces for reinforcement learning in portfolio management with value decomposition, and provide theoretical proof of convergence for this method. To validate the effectiveness of the SIRL algorithm, we conduct extensive experiments using stock market data from emerging (China) and developed (the US) countries. Our experiments indicate the versatility of the proposed SIRL across different types of trading data, encompassing both high-frequency (5-minute interval) and low-frequency (daily and weekly) data.
Multi-period portfolio optimization using a deep reinforcement learning hyper-heuristic approach
2024, Technological Forecasting and Social Change
Portfolio optimization concerns with periodically allocating the limited funds to invest in a variety of potential assets in order to satisfy investors’ appetites for risk and return goals. Recently, Deep Reinforcement Learning (DRL) has shown its promising capabilities in sequential decision making problems. However, traditional DRL algorithms directly operate in the space of low-level actions, which exhibits poor scalability and becomes intractable in real-world problem instances when the dimensionality of the environment increases. To deal with this, in this work, a novel DRL hyper-heuristic framework is proposed for multi-period portfolio optimization problem. Instead of exploiting the entire action domain, our proposed approach is more effective by searching for low-level well-developed trading strategies. In addition, our proposed approach is data-driven and respects the nature of the problem by taking advantage of expert domain knowledge and posing it multidimensional states to further leverage additional diverse information from alternative views of the environment. The proposed approach is evaluated on five real-world capital market problem instances and numerous experimental results demonstrate our proposed method can achieve notable performance gains compared to state-of-art trading strategies as well as traditional DRL baseline method. The data we used are from five stock indices, covering the period from the 2012 to 2022. Our study can have salient policy implications for investment strategy formulation and effective regulatory frameworks establishment.

View all citing articles on Scopus

View full text

An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown

Highlights

Abstract

Introduction

Section snippets

Literature review

Methodology

Trading algorithms comparison

Trading system and discussion

Conclusion

Decision Support Systems

Expert Systems with Applications

Computers & Operations Research

Expert Systems with Applications

Decision Support Systems

Decision Support Systems

Applied Soft Computing

Expert Systems with Applications

Expert Systems with Applications

Financial trading systems: Is recurrent reinforcement learning the way?

Reflexing interfaces: The complex coevolution of information technology ecosystems

Dynamic programming and optimal control

Putting economics (back) into quantitative models

The Journal of Portfolio Management

Institutional equity trading costs: NYSE versus nasdaq

The Journal of Finance

Beyond technical analysis: How to develop and implement a winning trading system

The mean-variance cardinality constrained portfolio optimization problem using a local search-based multi-objective evolutionary algorithm

Applied Intelligence

Implementing a simple rule for dynamic stop-loss strategies

The Journal of Investing

Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy?

Review of Financial Studies

Fx trading via recurrent reinforcement learning

Computational intelligence for financial engineering, 2003. proceedings. 2003 IEEE international conference on

FX trading via recurrent reinforcement learning

IEEE/IAFE conference on computational intelligence for financial engineering, proceedings (cifer)

Application of stochastic recurrent reinforcement learning to index trading

ESANN 2011 proceedings, 19th European symposium on artificial neural networks, computational intelligence and machine learning