research-article

Evolutionary meta reinforcement learning for portfolio optimization

Authors:
Myoung Hoon Ha

Korea Advanced Institute of Science and Technology, Daejoen, Korea

Korea Advanced Institute of Science and Technology, Daejoen, Korea
View Profile

,
Seung-geun Chi

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

,
Sangyeop Lee

Samsung Electronics, Gyeonggi-do, Korea

Samsung Electronics, Gyeonggi-do, Korea
View Profile

,
Yujin Cha

Korea Advanced Institute of Science and Technology, Daejoen, Korea

Korea Advanced Institute of Science and Technology, Daejoen, Korea
View Profile

,
Moon Byung-Ro

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

GECCO '21: Proceedings of the Genetic and Evolutionary Computation ConferenceJune 2021Pages 964–972https://doi.org/10.1145/3449639.3459386

Published:26 June 2021Publication History

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 964–972

ABSTRACT

Portfolio optimization is a control problem whose objective is to find the optimal strategy for the process of selecting the proportions of assets that can provide the maximum return. Conventional approaches formulate the problem as a single Markov decision process and apply reinforcement learning methods to provide solutions. However, it is well known that financial markets involve non-stationary processes, leading to violations of this assumption in these methods. In this work, we reformulate the portfolio optimization problem to deal with the non-stationary nature of financial markets. In our approach, we divide a long-term process into multiple short-term processes to adapt to context changes and consider the portfolio optimization problem as a multitask control problem. Thereafter, we propose an evolutionary meta reinforcement learning approach to search for an initial policy that can quickly adapt to the upcoming target tasks. We model the policies as convolutional networks that can score the match of the patterns in market data charts. Finally, we test our approach using real-world cryptographic currency data and show that it adapts well to the changes in the market and leads to better profitability.

References

M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. 2017. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments. arXiv:1710.03641 [cs] (Oct. 2017). http://arxiv.org/abs/1710.03641 arXiv: 1710.03641.Google Scholar
J. M. Baldwin. 1896. A New Factor in Evolution. The American Naturalist 30, 354 (1896), 441--451. https://www.jstor.org/stable/2453130 Publisher: [University of Chicago Press, American Society of Naturalists].Google ScholarCross Ref
M. N. Bandara, R. M. Ranasinghe, R. W. M. Arachchi, C. G. Somathilaka, S. Perera, and D. C. Wimalasuriya. 2015. A Complex Event Processing Toolkit for Detecting Technical Chart Patterns. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW '15). IEEE Computer Society, Washington, DC, USA, 547--556. Google ScholarDigital Library
I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. arXiv:1611.09940 [cs, stat] (Jan. 2017). http://arxiv.org/abs/1611.09940 arXiv: 1611.09940.Google Scholar
A. Bifet and R. Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing. In SDM. Google ScholarCross Ref
S. PM Choi, D.-Yan Yeung, and N. L Zhang. 2000. Hidden-mode markov decision processes for nonstationary sequential decision making. In Sequence Learning. Springer, 264--287.Google ScholarCross Ref
B. C Da Silva, E. W Basso, A. LC Bazzan, and P. M Engel. 2006. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning. ACM, 217--224.Google ScholarDigital Library
K. Doya, K. Samejima, K. Katagiri, and M. Kawato. 2002. Multiple model-based reinforcement learning. Neural Computation 14, 6 (June 2002), 1347--1369. Google ScholarDigital Library
C. Fernando, J. Sygnowski, S. Osindero, J. Wang, T. Schaul, D. Teplyashin, P. Sprechmann, A. Pritzel, and A. Rusu. 2018. Meta-learning by the Baldwin Effect. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '18). ACM, New York, NY, USA, 1313--1320. event-place: Kyoto, Japan. Google ScholarDigital Library
C. Finn, P. Abbeel, and S. Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning. 1126--1135. http://proceedings.mlr.press/v70/finn17a.htmlGoogle Scholar
J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. S. Torr, P. Kohli, and S. Whiteson. 2017. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. arXiv:1702.08887 [cs] (Feb. 2017). http://arxiv.org/abs/1702.08887 arXiv: 1702.08887.Google Scholar
J. N. Foerster, R. Y. Chen, M. Al-Shedivat, S. Whiteson, P. Abbeel, and I. Mordatch. 2017. Learning with Opponent-Learning Awareness. arXiv:1709.04326 [cs] (Sept. 2017). http://arxiv.org/abs/1709.04326 arXiv: 1709.04326.Google Scholar
J. Gama, R. Sebastião, and P. P. Rodrigues. 2013. On evaluating stream learning algorithms. Machine Learning 90, 3 (March 2013), 317--346. Google ScholarDigital Library
F. Gruau and D. Whitley. 1993. Adding Learning to the Cellular Development of Neural Networks: Evolution and the Baldwin Effect. Evolutionary Computation 1 (Sept. 1993), 213--233. Google ScholarDigital Library
M. H. Ha, S. Lee, and B. Moon. 2016. A Genetic Algorithm for Rule-based Chart Pattern Search in Stock Market Prices. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 (GECCO '16). ACM, New York, NY, USA, 909--916. Google ScholarDigital Library
M. H. Ha and B. Moon. 2017. The Evolution of Neural Network-based Chart Patterns: A Preliminary Study. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, New York, NY, USA, 1113--1120. Google ScholarDigital Library
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv:1801.01290 [cs, stat] (Aug. 2018). http://arxiv.org/abs/1801.01290 arXiv: 1801.01290.Google Scholar
J. B. P. A. de M. de Lamarck. 1809. Philosophie zoologique: Ou exposition; des considerations relative à l'histoire naturelle des animaux. Cambridge Library Collection - Darwin, Evolution and Genetics, Vol. 1. Cambridge University Press, Cambridge. Google ScholarCross Ref
S. Levine, C. Finn, T. Darrell, and P. Abbeel. 2015. End-to-End Training of Deep Visuomotor Policies. arXiv:1504.00702 [cs] (April 2015). http://arxiv.org/abs/1504.00702 arXiv: 1504.00702.Google Scholar
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, andD. Wierstra. 2019. Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs, stat] (July 2019). http://arxiv.org/abs/1509.02971 arXiv: 1509.02971 version: 6.Google Scholar
R. Lowe, A. Wu, Y. Tamar, J. Harb, P. Abbeel, and I. Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv:1706.02275 [cs] (June 2017). http://arxiv.org/abs/1706.02275 arXiv: 1706.02275.Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 [cs] (Dec. 2013). http://arxiv.org/abs/1312.5602 arXiv: 1312.5602.Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. A Rusu, J. Veness, M. G Bellemare, A. Graves, M. Riedmiller, A. K Fidjeland, G. Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google Scholar
J. Moody and M. Saffell. 2001. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12, 4 (July 2001), 875--889. Google ScholarDigital Library
J. Moody, M. Saffell, Y. Liao, and L. Wu. 1998. Reinforcement learning for trading systems and portfolios: Immediate vs future rewards. In Decision Technologies for Computational Finance. Springer, 129--140.Google Scholar
J. Moody, L. Wu, Y. Liao, and M. Saffell. 1998. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting 17, 5-6 (Sept. 1998), 441--470.Google ScholarCross Ref
A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn. 2018. Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning. arXiv:1803.11347 [cs, stat] (March 2018). http://arxiv.org/abs/1803.11347 arXiv: 1803.11347.Google Scholar
M. Nazari, A. Oroojlooy, L. V. Snyder, and M. Takác. 2018. Deep Reinforcement Learning for Solving the Vehicle Routing Problem. ArXiv abs/1802.04240 (2018).Google Scholar
A. Nichol, J. Achiam, and J. Schulman. 2018. On First-Order Meta-Learning Algorithms. arXiv:1803.02999 [cs] (March 2018). http://arxiv.org/abs/1803.02999 arXiv: 1803.02999.Google Scholar
S. P. M. Choi, D.-Y. Yeung, and N. Zhang. 1999. Hidden-Mode Markov Decision Processes. (Dec. 1999).Google Scholar
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, J. Fang, L., and S. Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfGoogle Scholar
I. Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie (Medizinische Informatik und Statistik), Berthold Schneider and Ulrich Ranft (Eds.). Springer Berlin Heidelberg, 83--114.Google Scholar
T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 [cs, stat] (March 2017). http://arxiv.org/abs/1703.03864 arXiv: 1703.03864.Google Scholar
J. Schmidhuber. 1987. Evolutionary Principles in Self-Referential Learning. On Learning now to Learn: The Meta-Meta-Meta...-Hook. Diploma Thesis. Technische Universitat Munchen, Germany. http://www.idsia.ch/~juergen/diploma.htmlGoogle Scholar
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs] (July 2017). http://arxiv.org/abs/1707.06347 arXiv: 1707.06347.Google Scholar
H.-P. Schwefel. 1974. Numerische Optimierung von Computer - Modellen. (Jan. 1974).Google Scholar
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.Google Scholar
F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune. 2017. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arXiv:1712.06567 [cs] (Dec. 2017). http://arxiv.org/abs/1712.06567 arXiv: 1712.06567.Google Scholar
T. Y. Tang, S. Egerton, and N. Kubota. 2014. Reinforcement Learning in non-stationary environments: An intrinsically motivated stress based memory retrieval performance (SBMRP) model. In 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). 1728--1735. Google ScholarCross Ref
D. Whitley, V. Gordon, and K. Mathias. 1994. Lamarckian evolution, the Baldwin effect and function optimization. In Parallel Problem Solving from Nature --- PPSN III (Lecture Notes in Computer Science), Yuval Davidor, Hans-Paul Schwefel, and Reinhard Männer (Eds.). Springer, Berlin, Heidelberg, 5--15. Google ScholarCross Ref
B. Zoph and Q. V. Le. 2016. Neural Architecture Search with Reinforcement Learning. arXiv:1611.01578 [cs] (Nov. 2016). http://arxiv.org/abs/1611.01578 arXiv: 1611.01578.Google Scholar

Index Terms

Evolutionary meta reinforcement learning for portfolio optimization

Recommendations

A hybrid intelligent system of ANFIS and CAPM for stock portfolio optimization

This paper addresses about an approach that suggests for stock portfolio optimization using the combination of Adaptive Neuro-Fuzzy Inference System ANFIS and Capital Asset Pricing Model CAPM. Stock portfolio optimization aims to determine which of the ...
Read More
A Minimum Variance Result in Continuous Trading Portfolio Optimization

The problem of minimizing the variance of discounted wealth at the end of a fixed period is solved when the expectation of terminal wealth is constrained to a specified investment goal. The results are obtained in a continuous trading framework under ...
Read More
On asymptotic log-optimal portfolio optimization
Abstract
In this paper, we consider a frequency-dependent portfolio optimization problem with multiple assets using a control-theoretic approach. The expected logarithmic growth (ELG) rate of wealth is used as the objective performance metric. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference
June 2021
1219 pages
ISBN:9781450383509
DOI:10.1145/3449639
Editor:
Francisco Chicano
University of Malaga
,
General Chair:
Krzysztof Krawiec
Poznan University of Technology
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
baldwinian evolution
evolution strategies
lamarckian evolution
meta reinforcement learning
portfolio optimization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 341
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evolutionary meta reinforcement learning for portfolio optimization

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

A hybrid intelligent system of ANFIS and CAPM for stock portfolio optimization

A Minimum Variance Result in Continuous Trading Portfolio Optimization

On asymptotic log-optimal portfolio optimization