An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown
Introduction
In financial investing, a general goal is to dynamically allocate a set of assets to maximize the returns over time and minimize risk simultaneously. For investors it is essential to be able to invest in a portfolio that can satisfy their preset goals by building an optimal portfolio initially and subsequently rebalancing it optimally. Portfolio theory began with mean-variance optimization by Markowitz (1952) where he proposed portfolio selection by maximizing the expected return while minimizing risk in the form of covariance matrices. Rebalancing a portfolio re-optimizes the weights of the portfolio over a predefined time horizon. The application of dynamic asset allocation using dynamic programming methods was originally introduced by Bertsekas (1995). Due to the curse of dimensionality in dynamic programming, automated self learning algorithms are normally applied by investors and scholars in designing optimal trading strategies instead. The reinforcement learning method is a type of approximate dynamic programming and a subcategory of machine learning introduced by Sutton and Barto (1998), and has been broadly applied by investors and researchers in building strategic asset allocation decision systems (Dempster, Leemans, 2006, Feuerriegel, Prendinger, 2016, Gold, 2003a, Tan, Quek, Cheng, 2011).
In this paper, we apply the recurrent reinforcement learning (RRL) method with a statistically coherent downside risk adjusted performance objective function to simultaneously generate both buy/sell signals and optimal asset allocation weights. Moody, Wu, Liao, and Saffell (1998) introduced recurrent reinforcement learning in building a trading system where they examined the performance effect between using the Sharp ratio vs. several economic utility functions. They concluded that the Sharpe ratio behaves like an adaptive utility function, and when maximizing the differential Sharpe ratio as immediate rewards in an online learning mode, the Sharpe ratio significantly outperforms the ones maximizing profits directly. Most of the subsequent work (Gold, 2003b, Maringer, Ramtohul, 2010, Maringer, Ramtohul, 2012, Pu, Ananthanarayanan, Bodik, Kandula, Akella, Bahl, Stoica, Mengistu, Lehman, Clune, Dulac-Arnold, Evans, Sunehag, Coppin, Shakirov, Zhang, Maringer, Graves, Schmidhuber, 2016) focused on equally weighted portfolios. Although both Moody et al. (1998) and Bertoluzzo and Corazza (2008) mentioned potential drawdown effects on RRL performance, neither thoroughly examined the actual effects.
Many practitioners tend to adjust the commonly accepted theoretical models to apply them to their particular situations, or develop measures that focus on their specific interests. They often neglect the theoretical aspects or assumptions of their adjustments, such as in the safety-first risk measures (i.e. the Sharpe ratio, the Sortino ratio, the Sterling ratio, and the Calmar ratio). Bhansali (2007) and Zimmermann, Drobetz, and Oertmann (2003) noted that many risk measures based on estimation of covariance matrices using historical data failed notoriously when they are needed the most. They agreed that the difference in volatility and correlations between up and down market environments implies the risk reduction potential is limited leaving them incapable of foreseeing stress-type events. We argue that large drawdowns usually lead to fund redemption, and hence they should lead to very different optimal decisions. In this paper, we extend the variable weight RRL long only approach by Moody and Saffell (2001) to a long-short approach and examine the expected maximum drawdown E(MDD) (Magdon-Ismail & Atiya, 2004) effect on portfolio performance with joint interaction of transaction costs. Magdon-Ismail, Atiya, Pratap, and Abu-Mostafa (2003) and Magdon-Ismail and Atiya (2004) provided a statistically coherent downside risk measure, the Calmar ratio with the expected maximum drawdown, which provides a theoretical base for us to apply this downside risk measure as a differentiable objective function in RRL. This E(MDD) based Calmar ratio (Magdon-Ismail et al., 2003) is distinctly different from the exponential moving average drawdown approach used by Moody and Saffell (2001).
More specifically, we compare the Calmar ratio1 with the Sharpe ratio where the risk adjusted measure of performance is calculated by the standard deviation of the returns over a predefined time horizon. Furthermore, we use the recurrent reinforcement learning method with two different objective functions through which we incorporate different risk considerations. We show that the recurrent reinforcement learning with variable weight asset allocation gives a superior performance when applied to a set of highly liquid exchange-traded funds (ETF) with various transaction cost considerations over a 5 year period. We also document that when the expected maximum drawdowns are considered, the RRL can generate a superior portfolio to the ones generated by the average deviation performance measure - the Sharpe ratio. This confirms the intuition that a reasonably low MDD is critical to the success of any fund.
In addition, we propose a portfolio allocation and rebalancing system using RRL with E(MDD) as the performance measure, and this trading system jointly considers transaction costs and market conditions to automatically retrain the system parameters to achieve better performances. We show that a trading system with the stop-loss based on market volatility regime is able to make the portfolio endure higher transaction costs in that the stop-loss strategy will exit the market when the volatility is high and retrain the parameters of the signal generating process and generate new signals to reenter the market. Such a trading decision system is adaptive to the market conditions and is more resilient to transaction cost shocks.
The rest of the paper is organized as follows. In Section 2, we review existing work on dynamic portfolio optimization using reinforcement learning methods. We introduce the expected maximum drawdown and its application to RRL in Section 3. We apply the RRL based portfolio rebalancing approach to a set of ETFs to compare the cost effect of the Sharpe ratio vs. the Calmar ratio using RRL in Section 4. Section 5 conducts a final analysis comparing the performance of the proposed risk-return portfolio optimization with that of two hedge fund indices, and Section 6 concludes the study and identifies some future work.
Section snippets
Literature review
Machine learning algorithms are widely used for financial market prediction and portfolio constructions, especially for automated trading strategies. Sutton, Barto, and Williams (1992) first introduced the reinforcement learning method (Q-learning) and provided its analytically proven capabilities for one class of adaptive optimal control problems. Recurrent reinforcement learning was introduced by Moody et al. (1998) where it was applied to stock trading as a learning algorithm and they
Methodology
In this paper, we use the recurrent reinforcement method in portfolio optimization with different risk considerations through two objective functions. Following Moody et al. (1998), we use the differential Sharpe ratio for dynamic optimization of trading system performance. We use performance functions both to increase the convergence of the learning process and to adapt to changing market conditions during live trading (see Fig. 1). During this process, the parameter updates can be done during
Trading algorithms comparison
In this section, we first compare the performance of three performance ratios as three different objective functions for the model. This will result in three trading algorithms producing different trading decisions for the same set of assets, and then we can readily assess the merits of each performance ratio in generating trading signals. The resulting portfolio rebalancing methods are: the Sharpe ratio RRL (SR-RRL), the Sterling ratio RRL (TR-RRL), and the Calmar ratio RRL (CR-RRL).
We show
Trading system and discussion
We develop an adaptive trading system based on the recurrent reinforcement learning using three different objective functions. The recurrent reinforcement learning system is a recursive learning system, where the system learns from every output every time step. In this system, the trader can select an objective function that would be the best for the assets of his portfolio. The system parameters are trained based on the objective function desired. We have introduced three objective functions
Conclusion
In this paper, we use the recurrent reinforcement learning method to solve a dynamic portfolio optimization problem where we develop four portfolios using the RRL and compare them with each other and the buy & hold portfolio. We use RRL methods to optimize the portfolio weights and rebalance the portfolio over a predefined time horizon. We compare the deferential of the Sharpe ratio and the Calmar ratio as the objective functions in the recurrent reinforcement learning process and examine the
References (40)
- et al.
A decision support system for strategic asset allocation
Decision Support Systems
(2011) - et al.
Computational intelligence and financial markets: A survey and future directions
Expert Systems with Applications
(2016) - et al.
Heuristics for cardinality constrained portfolio optimisation
Computers & Operations Research
(2000) - et al.
An automated fx trading system using adaptive reinforcement learning
Expert Systems with Applications
(2006) - et al.
Intelligent trading of seasonal effects: A decision support algorithm based on reinforcement learning
Decision Support Systems
(2014) - et al.
News-based trading strategies
Decision Support Systems
(2016) - et al.
A learning-guided multi-objective evolutionary algorithm for constrained portfolio optimization
Applied Soft Computing
(2014) - et al.
A hybrid approach to portfolio composition based on fundamental and technical indicators
Expert Systems with Applications
(2015) - et al.
Stock trading with cycles: A financial application of anfis and reinforcement learning
Expert Systems with Applications
(2011) - et al.
Financial trading systems: Is recurrent reinforcement learning the way?
Reflexing interfaces: The complex coevolution of information technology ecosystems
(2008)
Dynamic programming and optimal control
Putting economics (back) into quantitative models
The Journal of Portfolio Management
Institutional equity trading costs: NYSE versus nasdaq
The Journal of Finance
Beyond technical analysis: How to develop and implement a winning trading system
The mean-variance cardinality constrained portfolio optimization problem using a local search-based multi-objective evolutionary algorithm
Applied Intelligence
Implementing a simple rule for dynamic stop-loss strategies
The Journal of Investing
Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy?
Review of Financial Studies
Fx trading via recurrent reinforcement learning
Computational intelligence for financial engineering, 2003. proceedings. 2003 IEEE international conference on
FX trading via recurrent reinforcement learning
IEEE/IAFE conference on computational intelligence for financial engineering, proceedings (cifer)
Application of stochastic recurrent reinforcement learning to index trading
ESANN 2011 proceedings, 19th European symposium on artificial neural networks, computational intelligence and machine learning
Cited by (173)
New reinforcement learning based on representation transfer for portfolio management
2024, Knowledge-Based SystemsArtificial intelligence techniques in financial trading: A systematic literature review
2024, Journal of King Saud University - Computer and Information SciencesSoft imitation reinforcement learning with value decomposition for portfolio management
2024, Applied Soft ComputingMulti-period portfolio optimization using a deep reinforcement learning hyper-heuristic approach
2024, Technological Forecasting and Social Change