Skip to main content

A Deep Reinforcement Learning Framework for Optimal Trade Execution

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12461))

Abstract

In this article, we propose a deep reinforcement learning based framework to learn to minimize trade execution costs by splitting a sell order into child orders and execute them sequentially over a fixed period. The framework is based on a variant of the Deep Q-Network (DQN) algorithm that integrates the Double DQN, Dueling Network, and Noisy Nets. In contrast to previous research work, which uses implementation shortfall as the immediate rewards, we use a shaped reward structure, and we also incorporate the zero-ending inventory constraint into the DQN algorithm by slightly modifying the Q-function updates relative to standard Q-learning at the final step.

We demonstrate that the DQN based optimal trade execution framework (1) converges fast during the training phase, (2) outperforms TWAP, VWAP, AC and 2 DQN algorithms during the backtesting on 14 US equities, and also (3) improves the stability by incorporating the zero ending inventory constraint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For example, the differences of immediate rewards between current time period and arrival time at various volume levels, manually crafted indicators to flag specific market scenarios (i.e., regime shift, a significant trend in price changes, and so on).

  2. 2.

    TWAP represents the trading volume of the TWAP strategy in one step.

  3. 3.

    \(\text {TWAP order}= \frac{\text {Total }\#\text { of shares to trade} }{\text {Total } \#\text { of periods}}\).

  4. 4.

    Implementation Shortfall=arrival price\(\times \)traded volume - executed price\(\times \)traded volume.

  5. 5.

    \(\mathrm {AvgIS}\) is the average IS.

  6. 6.

    \(\mathrm {IS_{t}=IS}\) at time t.

  7. 7.

    Please refer to the appendix for the chosen hyperparameters.

References

  1. Abelson, H., Sussman, G.-J., Sussman, J.: Structure and Interpretation of Computer Programs. MIT Press, Cambridge (1985)

    MATH  Google Scholar 

  2. Baumgartner, R., Gottlob, G., Flesca, S.: Visual information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Databases, pp. 119–128. Morgan Kaufmann, Rome (2001)

    Google Scholar 

  3. Bertsimas, D., Lo, A.-W.: Optimal control of execution costs. J. Finan. Mark. 1(1), 1–50 (1998)

    Article  Google Scholar 

  4. Brachman, R.-J., Schmolze, J.-G.: An overview of the KL-ONE knowledge representation system. Cogn. Sci. 9(2), 171–216 (1985)

    Article  Google Scholar 

  5. Gottlob, G.: Complexity results for nonmonotonic logics. J. Logic Comput. 2(3), 397–425 (1992)

    Article  MathSciNet  Google Scholar 

  6. Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable queries. J. Comput. Syst. Sci. 64(3), 579–627 (2002)

    Article  MathSciNet  Google Scholar 

  7. Levesque, H.-J.: Foundations of a functional approach to knowledge representation. Artif. Intell 23(2), 155–212 (1984)

    Article  Google Scholar 

  8. Levesque, H.-J.: A logic of implicit and explicit belief. In: Proceedings of the Fourth National Conference on Artificial Intelligence, pp. 198–202. American Association for Artificial Intelligence, Austin (1984)

    Google Scholar 

  9. Nebel, B.: On the compilability and expressive power of propositional planning formalisms. J. Artif. Intell. Res. 12, 271–315 (2000)

    Article  MathSciNet  Google Scholar 

  10. Huberman, G., Stanzl, W.: Optimal liquidity trading. Rev. Finan. 9(2), 165–200 (2005)

    Article  Google Scholar 

  11. Almgren, R., Chriss, N.: Optimal execution of portfolio transactions. J. Risk 3, 5–40 (2000)

    Article  Google Scholar 

  12. Berkowitz, S.-A., Logue, D.-E., Noser Jr., E.-A.: The total cost of transactions on the NYSE. J. Finan. 43(1), 97–112 (1988)

    Article  Google Scholar 

  13. Nevmyvaka, Y., Feng, Y., Kearns, M.: Reinforcement learning for optimal trade execution. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 673–680. Association for Computing Machinery, Pittsburgh (2006)

    Google Scholar 

  14. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  15. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)

    Article  Google Scholar 

  16. Hendricks, D., Wilcox, D.: A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution. In: Proceedings from IEEE Conference on Computational Intelligence for Financial Economics and Engineering, pp. 457–464. IEEE, London (2014)

    Google Scholar 

  17. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  18. Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)

    Google Scholar 

  19. Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)

  20. Tsitsiklis, J.-N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)

    Article  MathSciNet  Google Scholar 

  21. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  22. Lundberg, S.-M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, Long Beach, CA, pp. 4768–4777 (2017)

    Google Scholar 

  23. Ning, B., Ling, F.-H.-T., Jaimungal, S.: Double Deep Q-Learning for Optimal Execution. arXiv:1812.06600 (2018)

  24. van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 2613–2621 (2010)

    Google Scholar 

  25. van Hasselt, H., Guez, A., Silver, D.: deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, pp. 2094–2100 (2016)

    Google Scholar 

  26. Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1995–2003. JMLR, New York (2016)

    Google Scholar 

  27. Fortunato, M., et al.: Noisy networks for exploration. In: International Conference on Learning Representations, Vancouver, British Columbia, Canada (2018)

    Google Scholar 

  28. Bacoyannis, V., Glukhov, V., Jin, T., Kochems, J., Song, D.-R.: Idiosyncrasies and challenges of data driven learning in electronic trading. In: NIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: The Impact of Fairness, Montréal, Canada (2018)

    Google Scholar 

  29. Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J.-E., Stoica, I.: Tune: a research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118 (2018)

  30. Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siyu Lin .

Editor information

Editors and Affiliations

A Hyperparameters

A Hyperparameters

We fine-tuned the hyperparameters on FB only, and we did not perform an exhaustive grid search on the hyperparameter space, but rather to draw random samples from the hyperparameter space due to limited computing resources. Finally, we choose the hyparameters listed in Table 3.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, S., Beling, P.A. (2021). A Deep Reinforcement Learning Framework for Optimal Trade Execution. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders, C., Van Hoecke, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12461. Springer, Cham. https://doi.org/10.1007/978-3-030-67670-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67670-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67669-8

  • Online ISBN: 978-3-030-67670-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics