ABSTRACT
Portfolio optimization is a control problem whose objective is to find the optimal strategy for the process of selecting the proportions of assets that can provide the maximum return. Conventional approaches formulate the problem as a single Markov decision process and apply reinforcement learning methods to provide solutions. However, it is well known that financial markets involve non-stationary processes, leading to violations of this assumption in these methods. In this work, we reformulate the portfolio optimization problem to deal with the non-stationary nature of financial markets. In our approach, we divide a long-term process into multiple short-term processes to adapt to context changes and consider the portfolio optimization problem as a multitask control problem. Thereafter, we propose an evolutionary meta reinforcement learning approach to search for an initial policy that can quickly adapt to the upcoming target tasks. We model the policies as convolutional networks that can score the match of the patterns in market data charts. Finally, we test our approach using real-world cryptographic currency data and show that it adapts well to the changes in the market and leads to better profitability.
- M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. 2017. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments. arXiv:1710.03641 [cs] (Oct. 2017). http://arxiv.org/abs/1710.03641 arXiv: 1710.03641.Google Scholar
- J. M. Baldwin. 1896. A New Factor in Evolution. The American Naturalist 30, 354 (1896), 441--451. https://www.jstor.org/stable/2453130 Publisher: [University of Chicago Press, American Society of Naturalists].Google ScholarCross Ref
- M. N. Bandara, R. M. Ranasinghe, R. W. M. Arachchi, C. G. Somathilaka, S. Perera, and D. C. Wimalasuriya. 2015. A Complex Event Processing Toolkit for Detecting Technical Chart Patterns. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW '15). IEEE Computer Society, Washington, DC, USA, 547--556. Google ScholarDigital Library
- I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. arXiv:1611.09940 [cs, stat] (Jan. 2017). http://arxiv.org/abs/1611.09940 arXiv: 1611.09940.Google Scholar
- A. Bifet and R. Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing. In SDM. Google ScholarCross Ref
- S. PM Choi, D.-Yan Yeung, and N. L Zhang. 2000. Hidden-mode markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer, 264--287.Google ScholarCross Ref
- B. C Da Silva, E. W Basso, A. LC Bazzan, and P. M Engel. 2006. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning. ACM, 217--224.Google ScholarDigital Library
- K. Doya, K. Samejima, K. Katagiri, and M. Kawato. 2002. Multiple model-based reinforcement learning. Neural Computation 14, 6 (June 2002), 1347--1369. Google ScholarDigital Library
- C. Fernando, J. Sygnowski, S. Osindero, J. Wang, T. Schaul, D. Teplyashin, P. Sprechmann, A. Pritzel, and A. Rusu. 2018. Meta-learning by the Baldwin Effect. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '18). ACM, New York, NY, USA, 1313--1320. event-place: Kyoto, Japan. Google ScholarDigital Library
- C. Finn, P. Abbeel, and S. Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning. 1126--1135. http://proceedings.mlr.press/v70/finn17a.htmlGoogle Scholar
- J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. S. Torr, P. Kohli, and S. Whiteson. 2017. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. arXiv:1702.08887 [cs] (Feb. 2017). http://arxiv.org/abs/1702.08887 arXiv: 1702.08887.Google Scholar
- J. N. Foerster, R. Y. Chen, M. Al-Shedivat, S. Whiteson, P. Abbeel, and I. Mordatch. 2017. Learning with Opponent-Learning Awareness. arXiv:1709.04326 [cs] (Sept. 2017). http://arxiv.org/abs/1709.04326 arXiv: 1709.04326.Google Scholar
- J. Gama, R. Sebastião, and P. P. Rodrigues. 2013. On evaluating stream learning algorithms. Machine Learning 90, 3 (March 2013), 317--346. Google ScholarDigital Library
- F. Gruau and D. Whitley. 1993. Adding Learning to the Cellular Development of Neural Networks: Evolution and the Baldwin Effect. Evolutionary Computation 1 (Sept. 1993), 213--233. Google ScholarDigital Library
- M. H. Ha, S. Lee, and B. Moon. 2016. A Genetic Algorithm for Rule-based Chart Pattern Search in Stock Market Prices. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (GECCO '16). ACM, New York, NY, USA, 909--916. Google ScholarDigital Library
- M. H. Ha and B. Moon. 2017. The Evolution of Neural Network-based Chart Patterns: A Preliminary Study. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, New York, NY, USA, 1113--1120. Google ScholarDigital Library
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv:1801.01290 [cs, stat] (Aug. 2018). http://arxiv.org/abs/1801.01290 arXiv: 1801.01290.Google Scholar
- J. B. P. A. de M. de Lamarck. 1809. Philosophie zoologique: Ou exposition; des considerations relative à l'histoire naturelle des animaux. Cambridge Library Collection - Darwin, Evolution and Genetics, Vol. 1. Cambridge University Press, Cambridge. Google ScholarCross Ref
- S. Levine, C. Finn, T. Darrell, and P. Abbeel. 2015. End-to-End Training of Deep Visuomotor Policies. arXiv:1504.00702 [cs] (April 2015). http://arxiv.org/abs/1504.00702 arXiv: 1504.00702.Google Scholar
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, andD. Wierstra. 2019. Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs, stat] (July 2019). http://arxiv.org/abs/1509.02971 arXiv: 1509.02971 version: 6.Google Scholar
- R. Lowe, A. Wu, Y. Tamar, J. Harb, P. Abbeel, and I. Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv:1706.02275 [cs] (June 2017). http://arxiv.org/abs/1706.02275 arXiv: 1706.02275.Google Scholar
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 [cs] (Dec. 2013). http://arxiv.org/abs/1312.5602 arXiv: 1312.5602.Google Scholar
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A Rusu, J. Veness, M. G Bellemare, A. Graves, M. Riedmiller, A. K Fidjeland, G. Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google Scholar
- J. Moody and M. Saffell. 2001. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12, 4 (July 2001), 875--889. Google ScholarDigital Library
- J. Moody, M. Saffell, Y. Liao, and L. Wu. 1998. Reinforcement learning for trading systems and portfolios: Immediate vs future rewards. In Decision Technologies for Computational Finance. Springer, 129--140.Google Scholar
- J. Moody, L. Wu, Y. Liao, and M. Saffell. 1998. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting 17, 5-6 (Sept. 1998), 441--470.Google ScholarCross Ref
- A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn. 2018. Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. arXiv:1803.11347 [cs, stat] (March 2018). http://arxiv.org/abs/1803.11347 arXiv: 1803.11347.Google Scholar
- M. Nazari, A. Oroojlooy, L. V. Snyder, and M. Takác. 2018. Deep Reinforcement Learning for Solving the Vehicle Routing Problem. ArXiv abs/1802.04240 (2018).Google Scholar
- A. Nichol, J. Achiam, and J. Schulman. 2018. On First-Order Meta-Learning Algorithms. arXiv:1803.02999 [cs] (March 2018). http://arxiv.org/abs/1803.02999 arXiv: 1803.02999.Google Scholar
- S. P. M. Choi, D.-Y. Yeung, and N. Zhang. 1999. Hidden-Mode Markov Decision Processes. (Dec. 1999).Google Scholar
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, J. Fang, L., and S. Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle Scholar
- I. Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie (Medizinische Informatik und Statistik), Berthold Schneider and Ulrich Ranft (Eds.). Springer Berlin Heidelberg, 83--114.Google Scholar
- T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 [cs, stat] (March 2017). http://arxiv.org/abs/1703.03864 arXiv: 1703.03864.Google Scholar
- J. Schmidhuber. 1987. Evolutionary Principles in Self-Referential Learning. On Learning now to Learn: The Meta-Meta-Meta...-Hook. Diploma Thesis. Technische Universitat Munchen, Germany. http://www.idsia.ch/~juergen/diploma.htmlGoogle Scholar
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs] (July 2017). http://arxiv.org/abs/1707.06347 arXiv: 1707.06347.Google Scholar
- H.-P. Schwefel. 1974. Numerische Optimierung von Computer - Modellen. (Jan. 1974).Google Scholar
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.Google Scholar
- F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune. 2017. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arXiv:1712.06567 [cs] (Dec. 2017). http://arxiv.org/abs/1712.06567 arXiv: 1712.06567.Google Scholar
- T. Y. Tang, S. Egerton, and N. Kubota. 2014. Reinforcement Learning in non-stationary environments: An intrinsically motivated stress based memory retrieval performance (SBMRP) model. In 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). 1728--1735. Google ScholarCross Ref
- D. Whitley, V. Gordon, and K. Mathias. 1994. Lamarckian evolution, the Baldwin effect and function optimization. In Parallel Problem Solving from Nature --- PPSN III (Lecture Notes in Computer Science), Yuval Davidor, Hans-Paul Schwefel, and Reinhard Männer (Eds.). Springer, Berlin, Heidelberg, 5--15. Google ScholarCross Ref
- B. Zoph and Q. V. Le. 2016. Neural Architecture Search with Reinforcement Learning. arXiv:1611.01578 [cs] (Nov. 2016). http://arxiv.org/abs/1611.01578 arXiv: 1611.01578.Google Scholar
Index Terms
- Evolutionary meta reinforcement learning for portfolio optimization
Recommendations
A hybrid intelligent system of ANFIS and CAPM for stock portfolio optimization
This paper addresses about an approach that suggests for stock portfolio optimization using the combination of Adaptive Neuro-Fuzzy Inference System ANFIS and Capital Asset Pricing Model CAPM. Stock portfolio optimization aims to determine which of the ...
A Minimum Variance Result in Continuous Trading Portfolio Optimization
The problem of minimizing the variance of discounted wealth at the end of a fixed period is solved when the expectation of terminal wealth is constrained to a specified investment goal. The results are obtained in a continuous trading framework under ...
On asymptotic log-optimal portfolio optimization
AbstractIn this paper, we consider a frequency-dependent portfolio optimization problem with multiple assets using a control-theoretic approach. The expected logarithmic growth (ELG) rate of wealth is used as the objective performance metric. ...
Comments