Abstract
In this article, we propose a deep reinforcement learning based framework to learn to minimize trade execution costs by splitting a sell order into child orders and execute them sequentially over a fixed period. The framework is based on a variant of the Deep Q-Network (DQN) algorithm that integrates the Double DQN, Dueling Network, and Noisy Nets. In contrast to previous research work, which uses implementation shortfall as the immediate rewards, we use a shaped reward structure, and we also incorporate the zero-ending inventory constraint into the DQN algorithm by slightly modifying the Q-function updates relative to standard Q-learning at the final step.
We demonstrate that the DQN based optimal trade execution framework (1) converges fast during the training phase, (2) outperforms TWAP, VWAP, AC and 2 DQN algorithms during the backtesting on 14 US equities, and also (3) improves the stability by incorporating the zero ending inventory constraint.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For example, the differences of immediate rewards between current time period and arrival time at various volume levels, manually crafted indicators to flag specific market scenarios (i.e., regime shift, a significant trend in price changes, and so on).
- 2.
TWAP represents the trading volume of the TWAP strategy in one step.
- 3.
\(\text {TWAP order}= \frac{\text {Total }\#\text { of shares to trade} }{\text {Total } \#\text { of periods}}\).
- 4.
Implementation Shortfall=arrival price\(\times \)traded volume - executed price\(\times \)traded volume.
- 5.
\(\mathrm {AvgIS}\) is the average IS.
- 6.
\(\mathrm {IS_{t}=IS}\) at time t.
- 7.
Please refer to the appendix for the chosen hyperparameters.
References
Abelson, H., Sussman, G.-J., Sussman, J.: Structure and Interpretation of Computer Programs. MIT Press, Cambridge (1985)
Baumgartner, R., Gottlob, G., Flesca, S.: Visual information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Databases, pp. 119–128. Morgan Kaufmann, Rome (2001)
Bertsimas, D., Lo, A.-W.: Optimal control of execution costs. J. Finan. Mark. 1(1), 1–50 (1998)
Brachman, R.-J., Schmolze, J.-G.: An overview of the KL-ONE knowledge representation system. Cogn. Sci. 9(2), 171–216 (1985)
Gottlob, G.: Complexity results for nonmonotonic logics. J. Logic Comput. 2(3), 397–425 (1992)
Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable queries. J. Comput. Syst. Sci. 64(3), 579–627 (2002)
Levesque, H.-J.: Foundations of a functional approach to knowledge representation. Artif. Intell 23(2), 155–212 (1984)
Levesque, H.-J.: A logic of implicit and explicit belief. In: Proceedings of the Fourth National Conference on Artificial Intelligence, pp. 198–202. American Association for Artificial Intelligence, Austin (1984)
Nebel, B.: On the compilability and expressive power of propositional planning formalisms. J. Artif. Intell. Res. 12, 271–315 (2000)
Huberman, G., Stanzl, W.: Optimal liquidity trading. Rev. Finan. 9(2), 165–200 (2005)
Almgren, R., Chriss, N.: Optimal execution of portfolio transactions. J. Risk 3, 5–40 (2000)
Berkowitz, S.-A., Logue, D.-E., Noser Jr., E.-A.: The total cost of transactions on the NYSE. J. Finan. 43(1), 97–112 (1988)
Nevmyvaka, Y., Feng, Y., Kearns, M.: Reinforcement learning for optimal trade execution. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 673–680. Association for Computing Machinery, Pittsburgh (2006)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)
Hendricks, D., Wilcox, D.: A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution. In: Proceedings from IEEE Conference on Computational Intelligence for Financial Economics and Engineering, pp. 457–464. IEEE, London (2014)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Tsitsiklis, J.-N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Autom. Control 42(5), 674–690 (1997)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Lundberg, S.-M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, Long Beach, CA, pp. 4768–4777 (2017)
Ning, B., Ling, F.-H.-T., Jaimungal, S.: Double Deep Q-Learning for Optimal Execution. arXiv:1812.06600 (2018)
van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, pp. 2613–2621 (2010)
van Hasselt, H., Guez, A., Silver, D.: deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, pp. 2094–2100 (2016)
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1995–2003. JMLR, New York (2016)
Fortunato, M., et al.: Noisy networks for exploration. In: International Conference on Learning Representations, Vancouver, British Columbia, Canada (2018)
Bacoyannis, V., Glukhov, V., Jin, T., Kochems, J., Song, D.-R.: Idiosyncrasies and challenges of data driven learning in electronic trading. In: NIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: The Impact of Fairness, Montréal, Canada (2018)
Liaw, R., Liang, E., Nishihara, R., Moritz, P., Gonzalez, J.-E., Stoica, I.: Tune: a research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118 (2018)
Liang, E., et al.: RLlib: abstractions for distributed reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Hyperparameters
A Hyperparameters
We fine-tuned the hyperparameters on FB only, and we did not perform an exhaustive grid search on the hyperparameter space, but rather to draw random samples from the hyperparameter space due to limited computing resources. Finally, we choose the hyparameters listed in Table 3.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, S., Beling, P.A. (2021). A Deep Reinforcement Learning Framework for Optimal Trade Execution. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders, C., Van Hoecke, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12461. Springer, Cham. https://doi.org/10.1007/978-3-030-67670-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-67670-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67669-8
Online ISBN: 978-3-030-67670-4
eBook Packages: Computer ScienceComputer Science (R0)