Elsevier

Expert Systems with Applications

Volume 87, 30 November 2017, Pages 267-279
Expert Systems with Applications

An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown

https://doi.org/10.1016/j.eswa.2017.06.023Get rights and content

Highlights

  • A reinforcement learning trading algorithm with expected drawdown risk is proposed.

  • The expected maximum drawdown is shown to improve portfolio signal generation.

  • The effectiveness of the method is validated using different transaction costs.

  • An adaptive portfolio rebalancing system with automated retraining is recommended.

Abstract

Dynamic control theory has long been used in solving optimal asset allocation problems, and a number of trading decision systems based on reinforcement learning methods have been applied in asset allocation and portfolio rebalancing. In this paper, we extend the existing work in recurrent reinforcement learning (RRL) and build an optimal variable weight portfolio allocation under a coherent downside risk measure, the expected maximum drawdown, E(MDD). In particular, we propose a recurrent reinforcement learning method, with a coherent risk adjusted performance objective function, the Calmar ratio, to obtain both buy and sell signals and asset allocation weights. Using a portfolio consisting of the most frequently traded exchange-traded funds, we show that the expected maximum drawdown risk based objective function yields superior return performance compared to previously proposed RRL objective functions (i.e. the Sharpe ratio and the Sterling ratio), and that variable weight RRL long/short portfolios outperform equal weight RRL long/short portfolios under different transaction cost scenarios. We further propose an adaptive E(MDD) risk based RRL portfolio rebalancing decision system with a transaction cost and market condition stop-loss retraining mechanism, and we show that the proposed portfolio trading system responds to transaction cost effects better and outperforms hedge fund benchmarks consistently.

Introduction

In financial investing, a general goal is to dynamically allocate a set of assets to maximize the returns over time and minimize risk simultaneously. For investors it is essential to be able to invest in a portfolio that can satisfy their preset goals by building an optimal portfolio initially and subsequently rebalancing it optimally. Portfolio theory began with mean-variance optimization by Markowitz (1952) where he proposed portfolio selection by maximizing the expected return while minimizing risk in the form of covariance matrices. Rebalancing a portfolio re-optimizes the weights of the portfolio over a predefined time horizon. The application of dynamic asset allocation using dynamic programming methods was originally introduced by Bertsekas (1995). Due to the curse of dimensionality in dynamic programming, automated self learning algorithms are normally applied by investors and scholars in designing optimal trading strategies instead. The reinforcement learning method is a type of approximate dynamic programming and a subcategory of machine learning introduced by Sutton and Barto (1998), and has been broadly applied by investors and researchers in building strategic asset allocation decision systems (Dempster, Leemans, 2006, Feuerriegel, Prendinger, 2016, Gold, 2003a, Tan, Quek, Cheng, 2011).

In this paper, we apply the recurrent reinforcement learning (RRL) method with a statistically coherent downside risk adjusted performance objective function to simultaneously generate both buy/sell signals and optimal asset allocation weights. Moody, Wu, Liao, and Saffell (1998) introduced recurrent reinforcement learning in building a trading system where they examined the performance effect between using the Sharp ratio vs. several economic utility functions. They concluded that the Sharpe ratio behaves like an adaptive utility function, and when maximizing the differential Sharpe ratio as immediate rewards in an online learning mode, the Sharpe ratio significantly outperforms the ones maximizing profits directly. Most of the subsequent work (Gold, 2003b, Maringer, Ramtohul, 2010, Maringer, Ramtohul, 2012, Pu, Ananthanarayanan, Bodik, Kandula, Akella, Bahl, Stoica, Mengistu, Lehman, Clune, Dulac-Arnold, Evans, Sunehag, Coppin, Shakirov, Zhang, Maringer, Graves, Schmidhuber, 2016) focused on equally weighted portfolios. Although both Moody et al. (1998) and Bertoluzzo and Corazza (2008) mentioned potential drawdown effects on RRL performance, neither thoroughly examined the actual effects.

Many practitioners tend to adjust the commonly accepted theoretical models to apply them to their particular situations, or develop measures that focus on their specific interests. They often neglect the theoretical aspects or assumptions of their adjustments, such as in the safety-first risk measures (i.e. the Sharpe ratio, the Sortino ratio, the Sterling ratio, and the Calmar ratio). Bhansali (2007) and Zimmermann, Drobetz, and Oertmann (2003) noted that many risk measures based on estimation of covariance matrices using historical data failed notoriously when they are needed the most. They agreed that the difference in volatility and correlations between up and down market environments implies the risk reduction potential is limited leaving them incapable of foreseeing stress-type events. We argue that large drawdowns usually lead to fund redemption, and hence they should lead to very different optimal decisions. In this paper, we extend the variable weight RRL long only approach by Moody and Saffell (2001) to a long-short approach and examine the expected maximum drawdown E(MDD) (Magdon-Ismail & Atiya, 2004) effect on portfolio performance with joint interaction of transaction costs. Magdon-Ismail, Atiya, Pratap, and Abu-Mostafa (2003) and Magdon-Ismail and Atiya (2004) provided a statistically coherent downside risk measure, the Calmar ratio with the expected maximum drawdown, which provides a theoretical base for us to apply this downside risk measure as a differentiable objective function in RRL. This E(MDD) based Calmar ratio (Magdon-Ismail et al., 2003) is distinctly different from the exponential moving average drawdown approach used by Moody and Saffell (2001).

More specifically, we compare the Calmar ratio1 with the Sharpe ratio where the risk adjusted measure of performance is calculated by the standard deviation of the returns over a predefined time horizon. Furthermore, we use the recurrent reinforcement learning method with two different objective functions through which we incorporate different risk considerations. We show that the recurrent reinforcement learning with variable weight asset allocation gives a superior performance when applied to a set of highly liquid exchange-traded funds (ETF) with various transaction cost considerations over a 5 year period. We also document that when the expected maximum drawdowns are considered, the RRL can generate a superior portfolio to the ones generated by the average deviation performance measure - the Sharpe ratio. This confirms the intuition that a reasonably low MDD is critical to the success of any fund.

In addition, we propose a portfolio allocation and rebalancing system using RRL with E(MDD) as the performance measure, and this trading system jointly considers transaction costs and market conditions to automatically retrain the system parameters to achieve better performances. We show that a trading system with the stop-loss based on market volatility regime is able to make the portfolio endure higher transaction costs in that the stop-loss strategy will exit the market when the volatility is high and retrain the parameters of the signal generating process and generate new signals to reenter the market. Such a trading decision system is adaptive to the market conditions and is more resilient to transaction cost shocks.

The rest of the paper is organized as follows. In Section 2, we review existing work on dynamic portfolio optimization using reinforcement learning methods. We introduce the expected maximum drawdown and its application to RRL in Section 3. We apply the RRL based portfolio rebalancing approach to a set of ETFs to compare the cost effect of the Sharpe ratio vs. the Calmar ratio using RRL in Section 4. Section 5 conducts a final analysis comparing the performance of the proposed risk-return portfolio optimization with that of two hedge fund indices, and Section 6 concludes the study and identifies some future work.

Section snippets

Literature review

Machine learning algorithms are widely used for financial market prediction and portfolio constructions, especially for automated trading strategies. Sutton, Barto, and Williams (1992) first introduced the reinforcement learning method (Q-learning) and provided its analytically proven capabilities for one class of adaptive optimal control problems. Recurrent reinforcement learning was introduced by Moody et al. (1998) where it was applied to stock trading as a learning algorithm and they

Methodology

In this paper, we use the recurrent reinforcement method in portfolio optimization with different risk considerations through two objective functions. Following Moody et al. (1998), we use the differential Sharpe ratio for dynamic optimization of trading system performance. We use performance functions both to increase the convergence of the learning process and to adapt to changing market conditions during live trading (see Fig. 1). During this process, the parameter updates can be done during

Trading algorithms comparison

In this section, we first compare the performance of three performance ratios as three different objective functions for the model. This will result in three trading algorithms producing different trading decisions for the same set of assets, and then we can readily assess the merits of each performance ratio in generating trading signals. The resulting portfolio rebalancing methods are: the Sharpe ratio RRL (SR-RRL), the Sterling ratio RRL (TR-RRL), and the Calmar ratio RRL (CR-RRL).

We show

Trading system and discussion

We develop an adaptive trading system based on the recurrent reinforcement learning using three different objective functions. The recurrent reinforcement learning system is a recursive learning system, where the system learns from every output every time step. In this system, the trader can select an objective function that would be the best for the assets of his portfolio. The system parameters are trained based on the objective function desired. We have introduced three objective functions

Conclusion

In this paper, we use the recurrent reinforcement learning method to solve a dynamic portfolio optimization problem where we develop four portfolios using the RRL and compare them with each other and the buy & hold portfolio. We use RRL methods to optimize the portfolio weights and rebalance the portfolio over a predefined time horizon. We compare the deferential of the Sharpe ratio and the Calmar ratio as the objective functions in the recurrent reinforcement learning process and examine the

References (40)

  • D.P. Bertsekas

    Dynamic programming and optimal control

    (1995)
  • V. Bhansali

    Putting economics (back) into quantitative models

    The Journal of Portfolio Management

    (2007)
  • L.K.C. Chan et al.

    Institutional equity trading costs: NYSE versus nasdaq

    The Journal of Finance

    (1997)
  • T.S. Chande

    Beyond technical analysis: How to develop and implement a winning trading system

    (2001)
  • B. Chen et al.

    The mean-variance cardinality constrained portfolio optimization problem using a local search-based multi-objective evolutionary algorithm

    Applied Intelligence

    (2017)
  • J. Chevallier et al.

    Implementing a simple rule for dynamic stop-loss strategies

    The Journal of Investing

    (2012)
  • V. DeMiguel et al.

    Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy?

    Review of Financial Studies

    (2009)
  • C. Gold

    Fx trading via recurrent reinforcement learning

    Computational intelligence for financial engineering, 2003. proceedings. 2003 IEEE international conference on

    (2003)
  • C. Gold

    FX trading via recurrent reinforcement learning

    IEEE/IAFE conference on computational intelligence for financial engineering, proceedings (cifer)

    (2003)
  • D. Gorse

    Application of stochastic recurrent reinforcement learning to index trading

    ESANN 2011 proceedings, 19th European symposium on artificial neural networks, computational intelligence and machine learning

    (2010)
  • Cited by (173)

    • Artificial intelligence techniques in financial trading: A systematic literature review

      2024, Journal of King Saud University - Computer and Information Sciences
    View all citing articles on Scopus
    View full text