Skip to main content

Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13718))

  • 1254 Accesses

Abstract

Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To remedy this, we present a novel distillation framework for training a light-weight student model to perform trading volume prediction given historical transaction data. Specifically, we turn the regression model into a probabilistic forecasting model, by training models to predict a Gaussian distribution to which the trading volume belongs. The student model can thus learn from the teacher at a more informative distributional level, by matching its predicted distributions to that of the teacher. Two correlational distillation objectives are further introduced to encourage the student to produce consistent pair-wise relationships with the teacher model. We evaluate the framework on a real-world stock volume dataset with two different time window settings. Experiments demonstrate that our framework is superior to strong baseline models, compressing the model size by \(5\times \) while maintaining \(99.6\%\) prediction accuracy. The extensive analysis further reveals that our framework is more effective than vanilla distillation methods under low-resource scenarios. Our code and data are available at https://github.com/lancopku/DCKD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Antulov-Fantulin, N., Guo, T., Lillo, F.: Temporal mixture ensemble models for intraday volume forecasting in cryptocurrency exchange markets. arXiv Trading and Market Microstructure (2020)

    Google Scholar 

  2. Białkowski, J., Darolles, S., Le Fol, G.: Improving vwap strategies: a dynamic volume approach. J. Bank. Finan. 32(9), 1709–1722 (2008)

    Article  Google Scholar 

  3. Brownlees, C.T., Cipollini, F., Gallo, G.M.: Intra-daily volume modeling and prediction for algorithmic trading. J. Finan. Econ. 9(3), 489–518 (2011)

    Google Scholar 

  4. Cartea, Á., Jaimungal, S.: A closed-form execution strategy to target volume weighted average price. SIAM J. Finan. Math. 7(1), 760–785 (2016)

    Article  MathSciNet  Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  6. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: ICLR (2020)

    Google Scholar 

  7. Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born-again neural networks. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 1602–1611 (2018)

    Google Scholar 

  8. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)

  9. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: NeurIPS, pp. 4107–4115 (2016)

    Google Scholar 

  12. Huptas, R.: Point forecasting of intraday volume using bayesian autoregressive conditional volume models. J. Forecast. (2018)

    Google Scholar 

  13. Jiao, X., et al.: Tinybert: distilling bert for natural language understanding. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4163–4174 (2020)

    Google Scholar 

  14. Li, L., et al.: CascadeBERT: accelerating inference of pre-trained language models via calibrated complete models cascade. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 475–486 (2021)

    Google Scholar 

  15. Li, L., Lin, Y., Ren, S., Li, P., Zhou, J., Sun, X.: Dynamic knowledge distillation for pre-trained language models. In: EMNLP, pp. 379–389 (2021)

    Google Scholar 

  16. Li, L., et al.: Model uncertainty-aware knowledge amalgamation for pre-trained language models. arXiv preprint arXiv:2112.07327 (2021)

  17. Liang, K.J., et al.: MixKD: towards efficient distillation of large-scale language models. In: ICLR (2021)

    Google Scholar 

  18. Libman, D.S., Haber, S., Schaps, M.: Volume prediction with neural networks. Front. Artif. Intell. 2 (2019)

    Google Scholar 

  19. Liu, X., Lai, K.K.: Intraday volume percentages forecasting using a dynamic svm-based approach. J. Syst. Sci. Complex. 30(2), 421–433 (2017)

    Article  Google Scholar 

  20. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

    Google Scholar 

  21. Mirzadeh, S., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: AAAI, pp. 5191–5198 (2020)

    Google Scholar 

  22. Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC, Boca Raton (2018)

    Book  Google Scholar 

  23. Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: CVPR, pp. 3967–3976 (2019)

    Google Scholar 

  24. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. In: ICLR (2015)

    Google Scholar 

  25. Salinas, D., Flunkert, V., Gasthaus, J., Januschowski, T.: Deepar: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36(3), 1181–1191 (2020)

    Article  Google Scholar 

  26. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: NeurIPS Workshop on Energy Efficient Machine Learning and Cognitive Computing (2019)

    Google Scholar 

  27. Saputra, M.R.U., de Gusmão, P.P.B., Almalioglu, Y., Markham, A., Trigoni, N.: Distilling knowledge from a deep pose regressor network. In: ICCV, pp. 263–272 (2019)

    Google Scholar 

  28. Shen, S., et al.: Q-BERT: hessian based ultra low precision quantization of BERT. In: AAAI, pp. 8815–8821 (2020)

    Google Scholar 

  29. Sun, S., Cheng, Y., Gan, Z., Liu, J.: Patient knowledge distillation for BERT model compression. In: EMNLP-IJCNLP, pp. 4323–4332 (2019)

    Google Scholar 

  30. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)

    Google Scholar 

  31. Xu, J., Zhou, W., Fu, Z., Zhou, H., Li, L.: A survey on green deep learning. arXiv preprint arXiv:2111.05193 (2021)

  32. Zhang, Z., Li, W., Bao, R., Harimoto, K., Wu, Y., Sun, X.: ASAT: adaptively scaled adversarial training in time series. arXiv preprint arXiv:2108.08976 (2021)

  33. Zhao, L., Li, W., Bao, R., Harimoto, K., Wu, Y., Sun, X.: Long-term, short-term and sudden event: trading volume movement prediction with graph-based multi-view modeling. In: Zhou, Z. (ed.) IJCAI, pp. 3764–3770 (2021)

    Google Scholar 

Download references

Acknowledgements

We thank all the anonymous reviewers for their constructive comments. This work is supported by a Research Grant from Mizuho Securities Co., Ltd. We sincerely thank Mizuho Securities for valuable domain expert suggestions and the experiment dataset.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ruihan Bao or Xu Sun .

Editor information

Editors and Affiliations

A Cosine Similarity of Gaussian Distributions

A Cosine Similarity of Gaussian Distributions

Proof

The inner-dot and the cosine similarity of \(\mathcal {N}_i\left( \mu _i, \sigma _i\right) \) and \(\mathcal {N}_j\left( \mu _j, \sigma _j\right) \) are:

$$\begin{aligned} (\mathcal {N}_i, \mathcal {N}_j)&=\int _{-\infty }^{+\infty }\mathcal {N}_i(t\mid \mu _i,\sigma _i)\mathcal {N}_j(t\mid \mu _j,\sigma _j)dt\\&=\int _{-\infty }^{+\infty }\frac{1}{2\pi \sigma _i^2\sigma _j^2}\exp \left( -\frac{(t-\mu _i)^2}{2\sigma _i^2}-\frac{(t-\mu _j)^2}{2\sigma _j^2}\right) dt\\&=\int _{-\infty }^{+\infty }\frac{1}{2\pi \sigma _i\sigma _j}\exp \left( -\frac{(t-\mu ')^2}{2\frac{\sigma _i^2\sigma _j^2}{\sigma _i^2+\sigma _j^2}}-\frac{(\mu _i-\mu _j)^2}{2(\sigma _i^2+\sigma _j^2)}\right) dt\\&=\frac{1}{\sqrt{2\pi (\sigma _i^2+\sigma _j^2)}}\exp \left( -\frac{(\mu _i-\mu _j)^2}{2(\sigma _i^2+\sigma _j^2)}\right) \\ \end{aligned}$$

where \(\mu '=\frac{\mu _i\sigma _j^2+\mu _j\sigma _i^2}{\sigma _i^2+\sigma _j^2}\), \((\mathcal {N}_i, \mathcal {N}_i)=\frac{1}{\sqrt{4\pi \sigma _i^2}}, (\mathcal {N}_j, \mathcal {N}_j)=\frac{1}{\sqrt{4\pi \sigma _j^2}}\),

$$\begin{aligned} \varphi _\text {Cosine}\left( \mathcal {N}_i, \mathcal {N}_j\right)&=\frac{(\mathcal {N}_i, \mathcal {N}_j)}{\sqrt{(\mathcal {N}_i, \mathcal {N}_i)(\mathcal {N}_j, \mathcal {N}_j)}}\\&= \frac{\sqrt{(4\pi \sigma _i^2)^{\frac{1}{2}}(4\pi \sigma _j^2)^{\frac{1}{2}}}}{\sqrt{2\pi (\sigma _i^2+\sigma _j^2)}}\exp \left( -\frac{(\mu _i-\mu _j)^2}{2(\sigma _i^2+\sigma _j^2)}\right) \\&= \sqrt{\frac{2\sigma _i\sigma _j}{{\sigma _i^2 + \sigma _j^2}}}\exp \left( - \frac{\left( \mu _i - \mu _j\right) ^2}{2\left( \sigma _i^2 + \sigma _j^2\right) } \right) \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, L., Zhang, Z., Bao, R., Harimoto, K., Sun, X. (2023). Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13718. Springer, Cham. https://doi.org/10.1007/978-3-031-26422-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26422-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26421-4

  • Online ISBN: 978-3-031-26422-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics