Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction

Li, Lei; Zhang, Zhiyuan; Bao, Ruihan; Harimoto, Keiko; Sun, Xu

doi:10.1007/978-3-031-26422-1_7

Lei Li^13,14,
Zhiyuan Zhang^13,14,
Ruihan Bao¹⁵,
Keiko Harimoto¹⁵ &
…
Xu Sun^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13718))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1254 Accesses

Abstract

Traditional knowledge distillation in classification problems transfers the knowledge via class correlations in the soft label produced by teacher models, which are not available in regression problems like stock trading volume prediction. To remedy this, we present a novel distillation framework for training a light-weight student model to perform trading volume prediction given historical transaction data. Specifically, we turn the regression model into a probabilistic forecasting model, by training models to predict a Gaussian distribution to which the trading volume belongs. The student model can thus learn from the teacher at a more informative distributional level, by matching its predicted distributions to that of the teacher. Two correlational distillation objectives are further introduced to encourage the student to produce consistent pair-wise relationships with the teacher model. We evaluate the framework on a real-world stock volume dataset with two different time window settings. Experiments demonstrate that our framework is superior to strong baseline models, compressing the model size by $5\times $ while maintaining $99.6\%$ prediction accuracy. The extensive analysis further reveals that our framework is more effective than vanilla distillation methods under low-resource scenarios. Our code and data are available at https://github.com/lancopku/DCKD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Collaborative multi-knowledge distillation under the influence of softmax regression representation

Article 05 November 2024

Knowledge Distillation with Information Compressed Representations

Harmonizing Knowledge Transfer in Neural Network with Unified Distillation

References

Antulov-Fantulin, N., Guo, T., Lillo, F.: Temporal mixture ensemble models for intraday volume forecasting in cryptocurrency exchange markets. arXiv Trading and Market Microstructure (2020)
Google Scholar
Białkowski, J., Darolles, S., Le Fol, G.: Improving vwap strategies: a dynamic volume approach. J. Bank. Finan. 32(9), 1709–1722 (2008)
Article Google Scholar
Brownlees, C.T., Cipollini, F., Gallo, G.M.: Intra-daily volume modeling and prediction for algorithmic trading. J. Finan. Econ. 9(3), 489–518 (2011)
Google Scholar
Cartea, Á., Jaimungal, S.: A closed-form execution strategy to target volume weighted average price. SIAM J. Finan. Math. 7(1), 760–785 (2016)
Article MathSciNet Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\times $16 words: transformers for image recognition at scale. In: ICLR (2020)
Google Scholar
Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born-again neural networks. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 1602–1611 (2018)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: NeurIPS, pp. 4107–4115 (2016)
Google Scholar
Huptas, R.: Point forecasting of intraday volume using bayesian autoregressive conditional volume models. J. Forecast. (2018)
Google Scholar
Jiao, X., et al.: Tinybert: distilling bert for natural language understanding. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4163–4174 (2020)
Google Scholar
Li, L., et al.: CascadeBERT: accelerating inference of pre-trained language models via calibrated complete models cascade. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 475–486 (2021)
Google Scholar
Li, L., Lin, Y., Ren, S., Li, P., Zhou, J., Sun, X.: Dynamic knowledge distillation for pre-trained language models. In: EMNLP, pp. 379–389 (2021)
Google Scholar
Li, L., et al.: Model uncertainty-aware knowledge amalgamation for pre-trained language models. arXiv preprint arXiv:2112.07327 (2021)
Liang, K.J., et al.: MixKD: towards efficient distillation of large-scale language models. In: ICLR (2021)
Google Scholar
Libman, D.S., Haber, S., Schaps, M.: Volume prediction with neural networks. Front. Artif. Intell. 2 (2019)
Google Scholar
Liu, X., Lai, K.K.: Intraday volume percentages forecasting using a dynamic svm-based approach. J. Syst. Sci. Complex. 30(2), 421–433 (2017)
Article Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Mirzadeh, S., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: AAAI, pp. 5191–5198 (2020)
Google Scholar
Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC, Boca Raton (2018)
Book Google Scholar
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: CVPR, pp. 3967–3976 (2019)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. In: ICLR (2015)
Google Scholar
Salinas, D., Flunkert, V., Gasthaus, J., Januschowski, T.: Deepar: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36(3), 1181–1191 (2020)
Article Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: NeurIPS Workshop on Energy Efficient Machine Learning and Cognitive Computing (2019)
Google Scholar
Saputra, M.R.U., de Gusmão, P.P.B., Almalioglu, Y., Markham, A., Trigoni, N.: Distilling knowledge from a deep pose regressor network. In: ICCV, pp. 263–272 (2019)
Google Scholar
Shen, S., et al.: Q-BERT: hessian based ultra low precision quantization of BERT. In: AAAI, pp. 8815–8821 (2020)
Google Scholar
Sun, S., Cheng, Y., Gan, Z., Liu, J.: Patient knowledge distillation for BERT model compression. In: EMNLP-IJCNLP, pp. 4323–4332 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 5998–6008 (2017)
Google Scholar
Xu, J., Zhou, W., Fu, Z., Zhou, H., Li, L.: A survey on green deep learning. arXiv preprint arXiv:2111.05193 (2021)
Zhang, Z., Li, W., Bao, R., Harimoto, K., Wu, Y., Sun, X.: ASAT: adaptively scaled adversarial training in time series. arXiv preprint arXiv:2108.08976 (2021)
Zhao, L., Li, W., Bao, R., Harimoto, K., Wu, Y., Sun, X.: Long-term, short-term and sudden event: trading volume movement prediction with graph-based multi-view modeling. In: Zhou, Z. (ed.) IJCAI, pp. 3764–3770 (2021)
Google Scholar

Download references

Acknowledgements

We thank all the anonymous reviewers for their constructive comments. This work is supported by a Research Grant from Mizuho Securities Co., Ltd. We sincerely thank Mizuho Securities for valuable domain expert suggestions and the experiment dataset.

Author information

Authors and Affiliations

MOE Key Lab of Computational Linguistics, Peking University, Beijing, China
Lei Li, Zhiyuan Zhang & Xu Sun
School of Computer Science, Peking University, Beijing, China
Lei Li, Zhiyuan Zhang & Xu Sun
Mizuho Securities Co., Ltd., Tokyo, Japan
Ruihan Bao & Keiko Harimoto

Authors

Lei Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruihan Bao
View author publications
You can also search for this author in PubMed Google Scholar
Keiko Harimoto
View author publications
You can also search for this author in PubMed Google Scholar
Xu Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ruihan Bao or Xu Sun .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d'Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

A Cosine Similarity of Gaussian Distributions

Proof

The inner-dot and the cosine similarity of $\mathcal {N}_i\left( \mu _i, \sigma _i\right) $ and $\mathcal {N}_j\left( \mu _j, \sigma _j\right) $ are:

$$\begin{aligned} (\mathcal {N}_i, \mathcal {N}_j)&=\int _{-\infty }^{+\infty }\mathcal {N}_i(t\mid \mu _i,\sigma _i)\mathcal {N}_j(t\mid \mu _j,\sigma _j)dt\\&=\int _{-\infty }^{+\infty }\frac{1}{2\pi \sigma _i^2\sigma _j^2}\exp \left( -\frac{(t-\mu _i)^2}{2\sigma _i^2}-\frac{(t-\mu _j)^2}{2\sigma _j^2}\right) dt\\&=\int _{-\infty }^{+\infty }\frac{1}{2\pi \sigma _i\sigma _j}\exp \left( -\frac{(t-\mu ')^2}{2\frac{\sigma _i^2\sigma _j^2}{\sigma _i^2+\sigma _j^2}}-\frac{(\mu _i-\mu _j)^2}{2(\sigma _i^2+\sigma _j^2)}\right) dt\\&=\frac{1}{\sqrt{2\pi (\sigma _i^2+\sigma _j^2)}}\exp \left( -\frac{(\mu _i-\mu _j)^2}{2(\sigma _i^2+\sigma _j^2)}\right) \\ \end{aligned}$$

where $\mu '=\frac{\mu _i\sigma _j^2+\mu _j\sigma _i^2}{\sigma _i^2+\sigma _j^2}$, $(\mathcal {N}_i, \mathcal {N}_i)=\frac{1}{\sqrt{4\pi \sigma _i^2}}, (\mathcal {N}_j, \mathcal {N}_j)=\frac{1}{\sqrt{4\pi \sigma _j^2}}$,

$$\begin{aligned} \varphi _\text {Cosine}\left( \mathcal {N}_i, \mathcal {N}_j\right)&=\frac{(\mathcal {N}_i, \mathcal {N}_j)}{\sqrt{(\mathcal {N}_i, \mathcal {N}_i)(\mathcal {N}_j, \mathcal {N}_j)}}\\&= \frac{\sqrt{(4\pi \sigma _i^2)^{\frac{1}{2}}(4\pi \sigma _j^2)^{\frac{1}{2}}}}{\sqrt{2\pi (\sigma _i^2+\sigma _j^2)}}\exp \left( -\frac{(\mu _i-\mu _j)^2}{2(\sigma _i^2+\sigma _j^2)}\right) \\&= \sqrt{\frac{2\sigma _i\sigma _j}{{\sigma _i^2 + \sigma _j^2}}}\exp \left( - \frac{\left( \mu _i - \mu _j\right) ^2}{2\left( \sigma _i^2 + \sigma _j^2\right) } \right) \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Zhang, Z., Bao, R., Harimoto, K., Sun, X. (2023). Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13718. Springer, Cham. https://doi.org/10.1007/978-3-031-26422-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-26422-1_7
Published: 18 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26421-4
Online ISBN: 978-3-031-26422-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Collaborative multi-knowledge distillation under the influence of softmax regression representation

Knowledge Distillation with Information Compressed Representations

Harmonizing Knowledge Transfer in Neural Network with Unified Distillation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Cosine Similarity of Gaussian Distributions

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Collaborative multi-knowledge distillation under the influence of softmax regression representation

Knowledge Distillation with Information Compressed Representations

Harmonizing Knowledge Transfer in Neural Network with Unified Distillation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Cosine Similarity of Gaussian Distributions

A Cosine Similarity of Gaussian Distributions

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation