Deep Reinforcement Learning with Comprehensive Reward for Stock Trading

Zhou, Qibin; Qu, Tuo; Han, Yuntao; Duan, Fuqing

doi:10.1007/978-981-99-1648-1_44

Qibin Zhou¹⁰,
Tuo Qu¹⁰,
Yuntao Han¹⁰ &
…
Fuqing Duan¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1794))

Included in the following conference series:

International Conference on Neural Information Processing

652 Accesses

Abstract

Stock trading is one of economically research hotspots. In the past decades, many researchers used machine learning methods to simply predict the short-term price of stocks or long-term trend of stocks. However, only by comprehensive consideration of these two we can better reduce the risk of stock trading. This paper models stock trading as an incomplete information game, and proposes a deep reinforcement learning framework for training trading agents. In order to make well use of the temporal relation of stock data, we select the most advanced Temporal Convolutional Network and Transformer network as the policy network in deep reinforcement learning, and use TRPO and PPO for policy optimization. We propose a reward function that integrates short-term stock price prediction and long-term stock trend prediction with controllable risks to compute the utility of the agent action, which allows the agent to learn low risk trading strategies. The trading experiment in the standard & poor 500 ETF (S &P500 index) validates the proposed deep reinforcement learning method, and the experimental results show that the strategies by the proposed method in economic indicators (Maximum drawdown, Sharpe Ratio, Return Curve) are better than the S &P500 ETF baseline strategy.

Supported by National Key Research and Development Project under Grant 2018AAA01008-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, Y., et al.: Stock market prediction of S &P 500 via combination of improved BCO approach and BP neural network. Expert Syst. Appl. 36, 8849–8854 (2009)
Article Google Scholar
Man Chon, U., Rasheed, K.: A relative tendency based stock market prediction system. In: 2010 Ninth International Conference on Machine Learning and Applications, Washington, pp. 949–953 (2010)
Google Scholar
Rapach, D.E., Strauss, J.K., Zhou, G.: International stock return predictability: what is the role of the United States. J. Finance 46 (2012)
Google Scholar
Graves, A.: Sequence transduction with recurrent neural networks. In: International Conference of Machine Learning (ICML) (2012)
Google Scholar
Iqbal, Z., et al.: Efficient machine learning techniques for stock market prediction. Engineering Research and Applications (2013)
Google Scholar
Chen, K., et al.: A LSTM-based method for stock returns prediction: a case study of China stock market. In: IEEE International Conference on Big Data IEEE (2015)
Google Scholar
Murekachiro, D.: A review of artificial neural networks application to stock market predictions. Network and Complex Systems (2016)
Google Scholar
Akita, R., Yoshihara, A., Matsubara, T., Uehara, K.: Deep learning for stock prediction using numerical and textual information. In: International Conference on Computer and Information Science (ICIS) (2016)
Google Scholar
Burch, N.: Time and space: why imperfect information games are hard. Ph.D. thesis, University of Alberta (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Neural Information Processing Systems (NIPS) (2017)
Google Scholar
Bai, S., et al.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
Li, X., Li, Y., Zhan, Y., Liu, X.-Y.: Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation. arXiv preprint arXiv:1907.01503 (2019)
Meng, T.L., Khushi, M.: Reinforcement learning in financial markets. Data 4(3), 110 (2019). https://doi.org/10.3390/data4030110
Article Google Scholar
Li, Y., Ni, P., Chang, V.: Application of deep reinforcement learning in stock trading strategies and stock forecasting. Computing 102, 1305–1322 (2020)
Article MathSciNet Google Scholar
Yuan, Y., Wen, W., Yang, J.: Using data augmentation based reinforcement learning for daily stock trading. Electronics 9(9), 1384 (2020). https://doi.org/10.3390/electronics9091384
Article Google Scholar
Wu, X., Chen, H., Wang, J., Troiano, L., et al.: Adaptive stock trading strategies with deep reinforcement learning methods. Inf. Sci. 538, 142–158 (2020)
Article MathSciNet Google Scholar
National University of Singapore, Singapore, Trung Hieu, L.: Deep reinforcement learning for stock portfolio optimization. IJMO 10(5), 139–144 (2020). https://doi.org/10.7763/IJMO.2020.V10.761
Badr, H., Ouhbi, B., Frikh, B.: Rules based policy for stock trading: a new deep reinforcement learning method. In: 2020 5th International Conference on Cloud Computing and Artificial Intelligence (2020)
Google Scholar
Liu, X.-Y., et al.: FinRL: a deep reinforcement learning library for automated stock trading in quantitative finance. arXiv preprint arXiv:2011.09607 (2020)
Carta, S., et al.: A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl. Intell. 51, 889–905 (2021)
Google Scholar
Carta, S., et al.: Multi-DQN: an ensemble of Deep Q-learning agents for stock market forecasting. Expert Syst. Appl. 164, 113820 (2021)
Article Google Scholar
Anish, C.M., Majhi, B.: Hybrid nonlinear adaptive scheme for stock market prediction using feedback FLANN and factor analysis. J. Korean Stat. Soc. 45, 64–76 (2016)
Article MathSciNet MATH Google Scholar
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P., Wang, J.: Trust region policy optimization. In: International Conference on Machine Learning (2015)
Google Scholar
Wang, Y., He, H., Tan, X., Gan, Y.: Trust region-guided proximal policy optimization. In: Conference and Workshop on Neural Information Processing (2019)
Google Scholar
Azhikodan, A.R., Bhat, A.G.K., Jadhav, M.V.: Stock trading bot using deep reinforcement learning. In: Innovations in Computer Science and Engineering (2019)
Google Scholar
Xu, Y., Yang, C., Peng, S., Nojima, Y.: A hybrid two-stage financial stock forecasting algorithm based on clustering and ensemble learning. Appl. Intell. 50(11), 3852–3867 (2020). https://doi.org/10.1007/s10489-020-01766-5
Article Google Scholar
Li, M., Chen, L., Zhao, J., Li, Q.: Sentiment analysis of Chinese stock reviews based on BERT model. Appl. Intell. 51(7), 5016–5024 (2021). https://doi.org/10.1007/s10489-020-02101-8
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Normal University, Beijing, China
Qibin Zhou, Tuo Qu, Yuntao Han & Fuqing Duan

Authors

Qibin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Tuo Qu
View author publications
You can also search for this author in PubMed Google Scholar
Yuntao Han
View author publications
You can also search for this author in PubMed Google Scholar
Fuqing Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qibin Zhou .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Q., Qu, T., Han, Y., Duan, F. (2023). Deep Reinforcement Learning with Comprehensive Reward for Stock Trading. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1794. Springer, Singapore. https://doi.org/10.1007/978-981-99-1648-1_44

Download citation

DOI: https://doi.org/10.1007/978-981-99-1648-1_44
Published: 15 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1647-4
Online ISBN: 978-981-99-1648-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Reinforcement Learning with Comprehensive Reward for Stock Trading