Abstract
When modeling user-item interaction sequences to extract sequential patterns, current recommender systems face the dual issues of a) long-distance dependencies in conjunction with b) high levels of noise. In addition, with the complexity of current recommendation model architectures there is a significant increase in computation time. Therefore, these models cannot meet the requirement of fast response needed in application scenarios such as online advertising. To deal with these issues, we propose a Knowledge Distilled Attention-based Latent Information Extraction Network for Sequential user behavior (KD-ALIENS). In this model structure, user and item attributes and history are utilized to model the latent information from high-order feature interactions in conjunction with user sequential historical behavior. With regard to the issues of long-distance dependency and noise, we have adopted the self-attention mechanism to learn the sequential patterns between items in a user-item interaction history. With regard to the issue of a complex model architecture which cannot meet the requirement of fast response, the use of model compression and acceleration is realized by: (a) use of a knowledge-distilled teacher and student module, wherein the complex teacher module extracts a user’s general preference from high-order feature interactions and sequential patterns of long history sequences; and (b) a sampling method to sample both the relatively long-term and short-term item histories. Experimental studies on two real-world datasets demonstrate considerable improvements for click-through rate (CTR) prediction accuracy relative to strong baseline models and also show the effectiveness of the student-model compression and acceleration for speed.





Similar content being viewed by others
References
Chen Q, Zhao H, Li W, Huang P, Ou W (2019) Behavior sequence transformer for e-commerce recommendation in alibaba. CoRR, arXiv:abs/1905.06874
Chen X, Zhang Y, Xu H, Qin Z, Zha H (2019) Adversarial distillation for efficient recommendation with external knowledge. ACM Trans Inf Syst 37(1):12:1–12:28
Cheng H, Koc L, Harmsen J, Shaked T, Chandra T D, Aradhye H, Anderson G, Corrado G S, Chai W, Ispir M et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, Boston, MA, USA, September 15 2016, pp 7–10
Covington P, Adams J, Sargin E (2016) Deep neural networks for youtube recommendations. In: Sen S, Geyer W, Freyne J, Castells P (eds) Proceedings of the 10th ACM conference on recommender systems, Boston, MA, USA, September 15-19, 2016. ACM, pp 191–198
Duchi J C, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874
Gong Y, Jiang Z, Feng Y, Hu B, Zhao K, Liu Q, Ou W (2020) Edgerec: Recommender system on edge in mobile taobao. pp 2477–2484
Gou J, Yu B, Maybank S J, Tao D (2020) Knowledge distillation: A survey. CoRR, arXiv:abs/2006.05525
Graepel T, Candela J Q, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine, pp 13–20
Guo H, Tang R, Ye Y, Li Z, He X (2017) Deepfm: A factorization-machine based neural network for ctr prediction. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, Melbourne, Australia, August 19-25 2017
He R, McAuley J J (2016) Fusing similarity models with markov chains for sparse sequential recommendation. pp 191–200
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks
Hinton G E, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR, arXiv:abs/1503.02531
Huang R, McIntyre S, Song M, E H, Ou Z (2020) An attention-based latent information extraction network (alien) for high-order feature interactions. Applied Sciences 10(5468)
Kang S, Hwang J, Kweon W, Yu H (2020) De-rrd: A knowledge distillation framework for recommender system. pp 605–614
Kang W-C, McAuley J J (2018) Self-attentive sequential recommendation. In: IEEE international conference on data mining, ICDM 2018, november 17-20, 2018. IEEE Computer Society, Singapore, pp 197–206
Lee J-, Choi M, Lee J, Shim H (2019) Collaborative distillation for top-n recommendation. pp 369–378
López P G, Montresor A, Epema D H J, Datta A, Higashino T, Iamnitchi A, Barcellos M P, Felber P, Rivière E (2015) Edge-centric computing: Vision and challenges. Comput Commun Rev 45(5):37–42
Pan Y, He F, Yu H (2019) A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Neurocomputing 332:137–148
Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J (2016) Product-based neural networks for user response prediction. In: Proceedings of the IEEE 16th international conference on data mining, Barcelona, Spain, December 12-15 2016, pp 1149–1154
Ren R, Liu Z, Li Y, Zhao W X, Wang H, Ding B, Wen J-R (2020) Sequential recommendation with self-attentive multi-adversarial network. pp 89–98
Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y (2015) Fitnets: Hints for thin deep nets. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Sun F, Liu J, Wu J, Pei C, Lin X, Ou W, Jiang P (2019) Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. pp 1441–1450
Tang J, Wang K (2018) Ranking distillation: Learning compact ranking models with high performance for recommender system. pp 2289–2298
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the Thirty-first conference on neural information processing systems, Long Beach, CA, USA, 4-9 December 2017, pp 5998–6008
Wang R, Fu B, Fu G, Wang M (2017) Deep & cross network for ad click predictions. In: Proceedings of the ADKDD’17, Halifax, NS, Canada, August 13-17 2017, pp 12:1–12:7
Wang S, Hu L, Wang Y, Cao L, Sheng Q Z, Orgun M A (2019) Sequential recommender systems: Challenges, progress and prospects. pp 6332–6338
Wu C-Y, Ahmed A, Beutel A, Smola A J, Jing H (2017) Recurrent recommender networks. pp 495–503
Wu L, Li S, Hsieh C-J, Sharpnack J (2020) Sse-pt: Sequential recommendation via personalized transformer. pp 328–337
Xu C, Li Q, Ge J, Gao J, Yang X, Pei C, Sun F, Wu J, Sun H, Ou W (2020) Privileged features distillation at taobao recommendations. pp 2590–2598
Zhou G, Zhu X, Song C, Fan Y, Zhu H, Ma X, Yan Y, Jin J, Li H, Gai K (2018) Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, London, UK, August 19-23 2018, pp 1059–1068
Zhu J, Liu J, Li W, Lai J, He X, Chen L, Zheng Z (2020) Ensembled CTR prediction via knowledge distillation. In: d’Aquin M, Dietze S, Hauff C, Curry E, Cudré-Mauroux P (eds) CIKM ’20: The 29th ACM international conference on information and knowledge management, virtual event, ireland, october 19-23, 2020. ACM, pp 2941–2958
Funding
This work is supported by the National Key Research and Development Program of China (2018YFB1403501).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abbreviations Appendix: The following abbreviations are used in this paper:
Abbreviations Appendix: The following abbreviations are used in this paper:
Rights and permissions
About this article
Cite this article
Huang, R., McIntyre, S., Song, M. et al. A knowledge distilled attention-based latent information extraction network for sequential user behavior. Multimed Tools Appl 82, 1017–1043 (2023). https://doi.org/10.1007/s11042-022-12513-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12513-y