Abstract
Financial portfolio management is reallocating the asset into financial products, whose goal is to maximize the profit under a certain risk. Since AlphaGo debated human professional players, deep reinforcement learning (DRL) algorithm has been widely used in various fields, including quantitative trading. The multi-agent system is a relatively new research branch in DRL, and its performance is better than that of a single agent in most cases. In this paper, we propose a novel multi-agent deep reinforcement learning algorithm with trend consistency regularization (TC-MARL) to find the optimal portfolio. Here, we divide the trend of stocks of one portfolio into two categories and train two different agents to learn the optimal trading strategy under these two stock trends. First, we build a trend consistency (TC) factor to recognize the consistency of several stocks from one portfolio. When the trend of these stocks is consistent, the factor is defined as 1; the trend is inconsistent, the factor is defined as \(-\) 1. Based on it, a novel regularization related to the weights is proposed and added to the reward function, named TC regularization. And the TC factor value is used as the sign of the regularization term. In this way, two agents with different reward functions are constructed, which have the same policy model and value model. Afterward, the proposed TC-MARL algorithm will dynamically switch between the two trained agents to find the optimal portfolio strategy according to the market status. Extensive experimental results on the Chinese Stock Market show the effectiveness of the proposed algorithm.





Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availibility
All data are public. Users can download the related data from the JoinQuant website (https://www.joinquant.com) or other finance databases.
References
Markowitz HM (1959) Portfolio selection: efficient diversification of investments. John Wiley, New York
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Furuta R, Inoue N, Yamasaki T (2019) Pixelrl: fully convolutional network with reinforcement learning for image processing. IEEE Trans Multimed 22(7):1704–1719
Gamrian S, Goldberg Y (2019) Transfer learning for related reinforcement learning tasks via image-to-image translation. In: International Conference on Machine Learning, pp 2063–2072 . PMLR
Pan B, Yang Y, Zhao Z, Zhuang Y, Cai D, He X (2018) Discourse marker augmented network with reinforcement learning for natural language inference. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers), pp 989–999
Zhong V, Xiong C, Socher R (2017) Seq2sql: generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103
Shi H, Lin Z, Zhang S, Li X, Hwang K-S (2018) An adaptive decision-making method with fuzzy Bayesian reinforcement learning for robot soccer. Inf Sci 436:268–281
Johannink T, Bahl S, Nair A, Luo J, Kumar A, Loskyll M, Ojea JA, Solowjow E, Levine S (2019) Residual reinforcement learning for robot control. In: 2019 international conference on robotics and automation (ICRA), pp 6023–6029. IEEE
Ma C, Li Z, Lin D, Zhang J (2020) Parallel multi-environment shaping algorithm for complex multi-step task. Neurocomputing 402:323–335
Zha D, Lai K.-H, Huang S, Cao Y, Reddy K, Vargas J, Nguyen A, Wei R, Guo J, Hu X (2020) RLCard: a platform for reinforcement learning in card games. In: IJCAI, pp 5264–5266
Liu Y, Liu Q, Zhao H, Pan Z, Liu C (2020) Adaptive quantitative trading: an imitative deep reinforcement learning approach. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 2128–2135
Lucarelli G, Borrotti M (2020) A deep Q-learning portfolio management framework for the cryptocurrency market. Neural Comput Appl 32(23):17229–17244
Moody J, Wu L, Liao Y, Saffell M (1998) Performance functions and reinforcement learning for trading systems and portfolios. J Forecast 17(5–6):441–470
Gao X, Chan L (2000) An algorithm for trading and portfolio management using q-learning and sharpe ratio maximization. In: Proceedings of the international conference on neural information processing, pp 832–837
Almahdi S, Yang SY (2017) An adaptive portfolio trading system: a risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Syst Appl 87:267–279
Jiang Z, Liang J (2017) Cryptocurrency portfolio management with deep reinforcement learning. In: 2017 intelligent systems conference (IntelliSys), pp 905–913 . IEEE
Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059
Liang Z, Chen H, Zhu J, Jiang K, Li Y (2018) Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940
Almahdi S, Yang SY (2019) A constrained portfolio trading system using particle swarm algorithm and recurrent reinforcement learning. Expert Syst Appl 130:145–156
Koratamaddi P, Wadhwani K, Gupta M, Sanjeevi DSG (2021) A multi-agent reinforcement learning approach for stock portfolio allocation. In: 8th ACM IKDD CODS and 26th COMAD, pp 410–410
Lee J, Kim R, Yi SW, Kang J (2020) Maps: multi-agent reinforcement learning-based portfolio management system. In: 29th international joint conference on artificial intelligence, IJCAI 2020, pp 4520–4526. International joint conferences on artificial intelligence
Lussange J, Lazarevich I, Bourgeois-Gironde S, Palminteri S, Gutkin B (2021) Modelling stock markets by multi-agent reinforcement learning. Comput Econ 57(1):113–147
Huang Z, Tanaka F(2022) MSPM: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management. Plos one 17(2): e0263689
JoinQuant. https://www.joinquant.com
Huang D, Zhou J, Li B, Hoi SC, Zhou S (2016) Robust median reversion strategy for online portfolio selection. IEEE Trans Knowl Data Eng 28(9):2480–2493
Li B, Hoi SC, Sahoo D, Liu Z-Y (2015) Moving average reversion strategy for on-line portfolio selection. Artif Intell 222:104–123
Li B, Zhao P, Hoi SC, Gopalkrishnan V (2012) PAMR: passive aggressive mean reversion strategy for portfolio selection. Mach Learn 87(2):221–258
Shi S, Li J, Li G, Pan P (2019) A multi-scale temporal feature aggregation convolutional neural network for portfolio management. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1613–1622
Lim QYE, Cao Q, Quek C (2022) Dynamic portfolio rebalancing through reinforcement learning. Neural Comput Appl 34(9):7125–7139
Bansal G, Nushi B, Kamar E, Lasecki WS, Weld DS, Horvitz E (2019) Beyond accuracy: the role of mental models in human-AI team performance. In: Proceedings of the AAAI conference on human computation and crowdsourcing, vol 7, pp 2–11
Acknowledgements
This work was supported in part by the Ministry of Education of Humanities and Social Science Project of China (No. 22XJCZH004), in part by the National Natural Science Foundation of China (Nos. 12201497, 61976174), in part by the Scientific Research Project of Shaanxi Provincial Department of Education (Nos. 22JK0186, 21JK0379), and in part by the Fundamental Ressearch Funds for the Central Universities (No. D5000220060).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human or animal participants
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: The summary of notations
Appendix A: The summary of notations
Here, we list all necessary notations used in this paper, as shown in Table 6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, C., Zhang, J., Li, Z. et al. Multi-agent deep reinforcement learning algorithm with trend consistency regularization for portfolio management. Neural Comput & Applic 35, 6589–6601 (2023). https://doi.org/10.1007/s00521-022-08011-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08011-9