A Deep Reinforcement Learning Framework for Optimal Trade Execution

Lin, Siyu; Beling, Peter A.

doi:10.1007/978-3-030-67670-4_14

Siyu Lin¹³ &
Peter A. Beling¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12461))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2662 Accesses

Abstract

In this article, we propose a deep reinforcement learning based framework to learn to minimize trade execution costs by splitting a sell order into child orders and execute them sequentially over a fixed period. The framework is based on a variant of the Deep Q-Network (DQN) algorithm that integrates the Double DQN, Dueling Network, and Noisy Nets. In contrast to previous research work, which uses implementation shortfall as the immediate rewards, we use a shaped reward structure, and we also incorporate the zero-ending inventory constraint into the DQN algorithm by slightly modifying the Q-function updates relative to standard Q-learning at the final step.

We demonstrate that the DQN based optimal trade execution framework (1) converges fast during the training phase, (2) outperforms TWAP, VWAP, AC and 2 DQN algorithms during the backtesting on 14 US equities, and also (3) improves the stability by incorporating the zero ending inventory constraint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Article 20 July 2021

Optimal Trade Execution Based on Deep Deterministic Policy Gradient

Solving Inventory Management Problems through Deep Reinforcement Learning

Article 10 December 2022

Notes

1.
For example, the differences of immediate rewards between current time period and arrival time at various volume levels, manually crafted indicators to flag specific market scenarios (i.e., regime shift, a significant trend in price changes, and so on).
2.
TWAP represents the trading volume of the TWAP strategy in one step.
3.
$\text {TWAP order}= \frac{\text {Total }\#\text { of shares to trade} }{\text {Total } \#\text { of periods}}$.
4.
Implementation Shortfall=arrival price$\times $traded volume - executed price$\times $traded volume.
5.
$\mathrm {AvgIS}$ is the average IS.
6.
$\mathrm {IS_{t}=IS}$ at time t.
7.
Please refer to the appendix for the chosen hyperparameters.

References

Abelson, H., Sussman, G.-J., Sussman, J.: Structure and Interpretation of Computer Programs. MIT Press, Cambridge (1985)
MATH Google Scholar
Baumgartner, R., Gottlob, G., Flesca, S.: Visual information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Databases, pp. 119–128. Morgan Kaufmann, Rome (2001)
Google Scholar
Bertsimas, D., Lo, A.-W.: Optimal control of execution costs. J. Finan. Mark. 1(1), 1–50 (1998)
Article Google Scholar
Brachman, R.-J., Schmolze, J.-G.: An overview of the KL-ONE knowledge representation system. Cogn. Sci. 9(2), 171–216 (1985)
Article Google Scholar
Gottlob, G.: Complexity results for nonmonotonic logics. J. Logic Comput. 2(3), 397–425 (1992)
Article MathSciNet Google Scholar
Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable queries. J. Comput. Syst. Sci. 64(3), 579–627 (2002)
Article MathSciNet Google Scholar
Levesque, H.-J.: Foundations of a functional approach to knowledge representation. Artif. Intell 23(2), 155–212 (1984)
Article Google Scholar
Levesque, H.-J.: A logic of implicit and explicit belief. In: Proceedings of the Fourth National Conference on Artificial Intelligence, pp. 198–202. American Association for Artificial Intelligence, Austin (1984)
Google Scholar
Nebel, B.: On the compilability and expressive power of propositional planning formalisms. J. Artif. Intell. Res. 12, 271–315 (2000)
Article MathSciNet Google Scholar
Huberman, G., Stanzl, W.: Optimal liquidity trading. Rev. Finan. 9(2), 165–200 (2005)
Article Google Scholar
Almgren, R., Chriss, N.: Optimal execution of portfolio transactions. J. Risk 3, 5–40 (2000)
Article Google Scholar
Berkowitz, S.-A., Logue, D.-E., Noser Jr., E.-A.: The total cost of transactions on the NYSE. J. Finan. 43(1), 97–112 (1988)
Article Google Scholar
Nevmyvaka, Y., Feng, Y., Kearns, M.: Reinforcement learning for optimal trade execution. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 673–680. Association for Computing Machinery, Pittsburgh (2006)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Article Google Scholar
Hendricks, D., Wilcox, D.: A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution. In: Proceedings from IEEE Conference on Computational Intelligence for Financial Economics and Engineering, pp. 457–464. IEEE, London (2014)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)
Google Scholar
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Tsitsiklis, J.-N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Article MathSciNet Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Lundberg, S.-M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, Long Beach, CA, pp. 4768–4777 (2017)
Google Scholar
Ning, B., Ling, F.-H.-T., Jaimungal, S.: Double Deep Q-Learning for Optimal Execution. arXiv:1812.06600 (2018)
van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 2613–2621 (2010)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, pp. 2094–2100 (2016)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1995–2003. JMLR, New York (2016)
Google Scholar
Fortunato, M., et al.: Noisy networks for exploration. In: International Conference on Learning Representations, Vancouver, British Columbia, Canada (2018)
Google Scholar
Bacoyannis, V., Glukhov, V., Jin, T., Kochems, J., Song, D.-R.: Idiosyncrasies and challenges of data driven learning in electronic trading. In: NIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: The Impact of Fairness, Montréal, Canada (2018)
Google Scholar
Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J.-E., Stoica, I.: Tune: a research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118 (2018)
Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering Systems and Environment, University of Virginia, Charlottesville, VA, USA
Siyu Lin & Peter A. Beling

Authors

Siyu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. Beling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siyu Lin .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, WA, USA
Yuxiao Dong
University College Dublin, Dublin, Ireland
Georgiana Ifrim
Jožef Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Amazon Alexa Knowledge, Cambridge, UK
Craig Saunders
Ghent University, Kotrijk, Belgium
Sofie Van Hoecke

A Hyperparameters

We fine-tuned the hyperparameters on FB only, and we did not perform an exhaustive grid search on the hyperparameter space, but rather to draw random samples from the hyperparameter space due to limited computing resources. Finally, we choose the hyparameters listed in Table 3.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, S., Beling, P.A. (2021). A Deep Reinforcement Learning Framework for Optimal Trade Execution. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders, C., Van Hoecke, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12461. Springer, Cham. https://doi.org/10.1007/978-3-030-67670-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-67670-4_14
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67669-8
Online ISBN: 978-3-030-67670-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

A Deep Reinforcement Learning Framework for Optimal Trade Execution

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Order dispatching for an ultra-fast delivery service via deep reinforcement learning

Optimal Trade Execution Based on Deep Deterministic Policy Gradient

Solving Inventory Management Problems through Deep Reinforcement Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Hyperparameters

A Hyperparameters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships