skip to main content
10.1145/3449639.3459386acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Evolutionary meta reinforcement learning for portfolio optimization

Authors Info & Claims
Published:26 June 2021Publication History

ABSTRACT

Portfolio optimization is a control problem whose objective is to find the optimal strategy for the process of selecting the proportions of assets that can provide the maximum return. Conventional approaches formulate the problem as a single Markov decision process and apply reinforcement learning methods to provide solutions. However, it is well known that financial markets involve non-stationary processes, leading to violations of this assumption in these methods. In this work, we reformulate the portfolio optimization problem to deal with the non-stationary nature of financial markets. In our approach, we divide a long-term process into multiple short-term processes to adapt to context changes and consider the portfolio optimization problem as a multitask control problem. Thereafter, we propose an evolutionary meta reinforcement learning approach to search for an initial policy that can quickly adapt to the upcoming target tasks. We model the policies as convolutional networks that can score the match of the patterns in market data charts. Finally, we test our approach using real-world cryptographic currency data and show that it adapts well to the changes in the market and leads to better profitability.

References

  1. M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. 2017. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments. arXiv:1710.03641 [cs] (Oct. 2017). http://arxiv.org/abs/1710.03641 arXiv: 1710.03641.Google ScholarGoogle Scholar
  2. J. M. Baldwin. 1896. A New Factor in Evolution. The American Naturalist 30, 354 (1896), 441--451. https://www.jstor.org/stable/2453130 Publisher: [University of Chicago Press, American Society of Naturalists].Google ScholarGoogle ScholarCross RefCross Ref
  3. M. N. Bandara, R. M. Ranasinghe, R. W. M. Arachchi, C. G. Somathilaka, S. Perera, and D. C. Wimalasuriya. 2015. A Complex Event Processing Toolkit for Detecting Technical Chart Patterns. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW '15). IEEE Computer Society, Washington, DC, USA, 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. arXiv:1611.09940 [cs, stat] (Jan. 2017). http://arxiv.org/abs/1611.09940 arXiv: 1611.09940.Google ScholarGoogle Scholar
  5. A. Bifet and R. Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing. In SDM. Google ScholarGoogle ScholarCross RefCross Ref
  6. S. PM Choi, D.-Yan Yeung, and N. L Zhang. 2000. Hidden-mode markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer, 264--287.Google ScholarGoogle ScholarCross RefCross Ref
  7. B. C Da Silva, E. W Basso, A. LC Bazzan, and P. M Engel. 2006. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning. ACM, 217--224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Doya, K. Samejima, K. Katagiri, and M. Kawato. 2002. Multiple model-based reinforcement learning. Neural Computation 14, 6 (June 2002), 1347--1369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Fernando, J. Sygnowski, S. Osindero, J. Wang, T. Schaul, D. Teplyashin, P. Sprechmann, A. Pritzel, and A. Rusu. 2018. Meta-learning by the Baldwin Effect. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '18). ACM, New York, NY, USA, 1313--1320. event-place: Kyoto, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Finn, P. Abbeel, and S. Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning. 1126--1135. http://proceedings.mlr.press/v70/finn17a.htmlGoogle ScholarGoogle Scholar
  11. J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. S. Torr, P. Kohli, and S. Whiteson. 2017. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. arXiv:1702.08887 [cs] (Feb. 2017). http://arxiv.org/abs/1702.08887 arXiv: 1702.08887.Google ScholarGoogle Scholar
  12. J. N. Foerster, R. Y. Chen, M. Al-Shedivat, S. Whiteson, P. Abbeel, and I. Mordatch. 2017. Learning with Opponent-Learning Awareness. arXiv:1709.04326 [cs] (Sept. 2017). http://arxiv.org/abs/1709.04326 arXiv: 1709.04326.Google ScholarGoogle Scholar
  13. J. Gama, R. Sebastião, and P. P. Rodrigues. 2013. On evaluating stream learning algorithms. Machine Learning 90, 3 (March 2013), 317--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Gruau and D. Whitley. 1993. Adding Learning to the Cellular Development of Neural Networks: Evolution and the Baldwin Effect. Evolutionary Computation 1 (Sept. 1993), 213--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. H. Ha, S. Lee, and B. Moon. 2016. A Genetic Algorithm for Rule-based Chart Pattern Search in Stock Market Prices. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (GECCO '16). ACM, New York, NY, USA, 909--916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. H. Ha and B. Moon. 2017. The Evolution of Neural Network-based Chart Patterns: A Preliminary Study. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, New York, NY, USA, 1113--1120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv:1801.01290 [cs, stat] (Aug. 2018). http://arxiv.org/abs/1801.01290 arXiv: 1801.01290.Google ScholarGoogle Scholar
  18. J. B. P. A. de M. de Lamarck. 1809. Philosophie zoologique: Ou exposition; des considerations relative à l'histoire naturelle des animaux. Cambridge Library Collection - Darwin, Evolution and Genetics, Vol. 1. Cambridge University Press, Cambridge. Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Levine, C. Finn, T. Darrell, and P. Abbeel. 2015. End-to-End Training of Deep Visuomotor Policies. arXiv:1504.00702 [cs] (April 2015). http://arxiv.org/abs/1504.00702 arXiv: 1504.00702.Google ScholarGoogle Scholar
  20. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, andD. Wierstra. 2019. Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs, stat] (July 2019). http://arxiv.org/abs/1509.02971 arXiv: 1509.02971 version: 6.Google ScholarGoogle Scholar
  21. R. Lowe, A. Wu, Y. Tamar, J. Harb, P. Abbeel, and I. Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv:1706.02275 [cs] (June 2017). http://arxiv.org/abs/1706.02275 arXiv: 1706.02275.Google ScholarGoogle Scholar
  22. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 [cs] (Dec. 2013). http://arxiv.org/abs/1312.5602 arXiv: 1312.5602.Google ScholarGoogle Scholar
  23. V. Mnih, K. Kavukcuoglu, D. Silver, A. A Rusu, J. Veness, M. G Bellemare, A. Graves, M. Riedmiller, A. K Fidjeland, G. Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google ScholarGoogle Scholar
  24. J. Moody and M. Saffell. 2001. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12, 4 (July 2001), 875--889. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Moody, M. Saffell, Y. Liao, and L. Wu. 1998. Reinforcement learning for trading systems and portfolios: Immediate vs future rewards. In Decision Technologies for Computational Finance. Springer, 129--140.Google ScholarGoogle Scholar
  26. J. Moody, L. Wu, Y. Liao, and M. Saffell. 1998. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting 17, 5-6 (Sept. 1998), 441--470.Google ScholarGoogle ScholarCross RefCross Ref
  27. A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn. 2018. Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. arXiv:1803.11347 [cs, stat] (March 2018). http://arxiv.org/abs/1803.11347 arXiv: 1803.11347.Google ScholarGoogle Scholar
  28. M. Nazari, A. Oroojlooy, L. V. Snyder, and M. Takác. 2018. Deep Reinforcement Learning for Solving the Vehicle Routing Problem. ArXiv abs/1802.04240 (2018).Google ScholarGoogle Scholar
  29. A. Nichol, J. Achiam, and J. Schulman. 2018. On First-Order Meta-Learning Algorithms. arXiv:1803.02999 [cs] (March 2018). http://arxiv.org/abs/1803.02999 arXiv: 1803.02999.Google ScholarGoogle Scholar
  30. S. P. M. Choi, D.-Y. Yeung, and N. Zhang. 1999. Hidden-Mode Markov Decision Processes. (Dec. 1999).Google ScholarGoogle Scholar
  31. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, J. Fang, L., and S. Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle ScholarGoogle Scholar
  32. I. Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie (Medizinische Informatik und Statistik), Berthold Schneider and Ulrich Ranft (Eds.). Springer Berlin Heidelberg, 83--114.Google ScholarGoogle Scholar
  33. T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 [cs, stat] (March 2017). http://arxiv.org/abs/1703.03864 arXiv: 1703.03864.Google ScholarGoogle Scholar
  34. J. Schmidhuber. 1987. Evolutionary Principles in Self-Referential Learning. On Learning now to Learn: The Meta-Meta-Meta...-Hook. Diploma Thesis. Technische Universitat Munchen, Germany. http://www.idsia.ch/~juergen/diploma.htmlGoogle ScholarGoogle Scholar
  35. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs] (July 2017). http://arxiv.org/abs/1707.06347 arXiv: 1707.06347.Google ScholarGoogle Scholar
  36. H.-P. Schwefel. 1974. Numerische Optimierung von Computer - Modellen. (Jan. 1974).Google ScholarGoogle Scholar
  37. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.Google ScholarGoogle Scholar
  38. F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune. 2017. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arXiv:1712.06567 [cs] (Dec. 2017). http://arxiv.org/abs/1712.06567 arXiv: 1712.06567.Google ScholarGoogle Scholar
  39. T. Y. Tang, S. Egerton, and N. Kubota. 2014. Reinforcement Learning in non-stationary environments: An intrinsically motivated stress based memory retrieval performance (SBMRP) model. In 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). 1728--1735. Google ScholarGoogle ScholarCross RefCross Ref
  40. D. Whitley, V. Gordon, and K. Mathias. 1994. Lamarckian evolution, the Baldwin effect and function optimization. In Parallel Problem Solving from Nature --- PPSN III (Lecture Notes in Computer Science), Yuval Davidor, Hans-Paul Schwefel, and Reinhard Männer (Eds.). Springer, Berlin, Heidelberg, 5--15. Google ScholarGoogle ScholarCross RefCross Ref
  41. B. Zoph and Q. V. Le. 2016. Neural Architecture Search with Reinforcement Learning. arXiv:1611.01578 [cs] (Nov. 2016). http://arxiv.org/abs/1611.01578 arXiv: 1611.01578.Google ScholarGoogle Scholar

Index Terms

  1. Evolutionary meta reinforcement learning for portfolio optimization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference
          June 2021
          1219 pages
          ISBN:9781450383509
          DOI:10.1145/3449639

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 June 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,669of4,410submissions,38%

          Upcoming Conference

          GECCO '24
          Genetic and Evolutionary Computation Conference
          July 14 - 18, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader