A knowledge distilled attention-based latent information extraction network for sequential user behavior

Huang, Ruo; McIntyre, Shelby; Song, Meina; E, Haihong; Ou, Zhonghong

doi:10.1007/s11042-022-12513-y

A knowledge distilled attention-based latent information extraction network for sequential user behavior

Published: 11 June 2022

Volume 82, pages 1017–1043, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ruo Huang^1,2,3,
Shelby McIntyre⁴,
Meina Song¹,
Haihong E¹ &
…
Zhonghong Ou¹

236 Accesses
1 Altmetric
Explore all metrics

Abstract

When modeling user-item interaction sequences to extract sequential patterns, current recommender systems face the dual issues of a) long-distance dependencies in conjunction with b) high levels of noise. In addition, with the complexity of current recommendation model architectures there is a significant increase in computation time. Therefore, these models cannot meet the requirement of fast response needed in application scenarios such as online advertising. To deal with these issues, we propose a Knowledge Distilled Attention-based Latent Information Extraction Network for Sequential user behavior (KD-ALIENS). In this model structure, user and item attributes and history are utilized to model the latent information from high-order feature interactions in conjunction with user sequential historical behavior. With regard to the issues of long-distance dependency and noise, we have adopted the self-attention mechanism to learn the sequential patterns between items in a user-item interaction history. With regard to the issue of a complex model architecture which cannot meet the requirement of fast response, the use of model compression and acceleration is realized by: (a) use of a knowledge-distilled teacher and student module, wherein the complex teacher module extracts a user’s general preference from high-order feature interactions and sequential patterns of long history sequences; and (b) a sampling method to sample both the relatively long-term and short-term item histories. Experimental studies on two real-world datasets demonstrate considerable improvements for click-through rate (CTR) prediction accuracy relative to strong baseline models and also show the effectiveness of the student-model compression and acceleration for speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving session-based recommendation with contrastive learning

Article 21 June 2022

Knowledge-enhanced personalized hierarchical attention network for sequential recommendation

Article 17 January 2024

GUESR: A Global Unsupervised Data-Enhancement with Bucket-Cluster Sampling for Sequential Recommendation

References

Chen Q, Zhao H, Li W, Huang P, Ou W (2019) Behavior sequence transformer for e-commerce recommendation in alibaba. CoRR, arXiv:abs/1905.06874
Chen X, Zhang Y, Xu H, Qin Z, Zha H (2019) Adversarial distillation for efficient recommendation with external knowledge. ACM Trans Inf Syst 37(1):12:1–12:28
Article Google Scholar
Cheng H, Koc L, Harmsen J, Shaked T, Chandra T D, Aradhye H, Anderson G, Corrado G S, Chai W, Ispir M et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, Boston, MA, USA, September 15 2016, pp 7–10
Covington P, Adams J, Sargin E (2016) Deep neural networks for youtube recommendations. In: Sen S, Geyer W, Freyne J, Castells P (eds) Proceedings of the 10th ACM conference on recommender systems, Boston, MA, USA, September 15-19, 2016. ACM, pp 191–198
Duchi J C, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
MathSciNet MATH Google Scholar
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Gong Y, Jiang Z, Feng Y, Hu B, Zhao K, Liu Q, Ou W (2020) Edgerec: Recommender system on edge in mobile taobao. pp 2477–2484
Gou J, Yu B, Maybank S J, Tao D (2020) Knowledge distillation: A survey. CoRR, arXiv:abs/2006.05525
Graepel T, Candela J Q, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine, pp 13–20
Guo H, Tang R, Ye Y, Li Z, He X (2017) Deepfm: A factorization-machine based neural network for ctr prediction. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, Melbourne, Australia, August 19-25 2017
He R, McAuley J J (2016) Fusing similarity models with markov chains for sparse sequential recommendation. pp 191–200
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks
Hinton G E, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR, arXiv:abs/1503.02531
Huang R, McIntyre S, Song M, E H, Ou Z (2020) An attention-based latent information extraction network (alien) for high-order feature interactions. Applied Sciences 10(5468)
Kang S, Hwang J, Kweon W, Yu H (2020) De-rrd: A knowledge distillation framework for recommender system. pp 605–614
Kang W-C, McAuley J J (2018) Self-attentive sequential recommendation. In: IEEE international conference on data mining, ICDM 2018, november 17-20, 2018. IEEE Computer Society, Singapore, pp 197–206
Lee J-, Choi M, Lee J, Shim H (2019) Collaborative distillation for top-n recommendation. pp 369–378
López P G, Montresor A, Epema D H J, Datta A, Higashino T, Iamnitchi A, Barcellos M P, Felber P, Rivière E (2015) Edge-centric computing: Vision and challenges. Comput Commun Rev 45(5):37–42
Article Google Scholar
Pan Y, He F, Yu H (2019) A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Neurocomputing 332:137–148
Article Google Scholar
Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J (2016) Product-based neural networks for user response prediction. In: Proceedings of the IEEE 16th international conference on data mining, Barcelona, Spain, December 12-15 2016, pp 1149–1154
Ren R, Liu Z, Li Y, Zhao W X, Wang H, Ding B, Wen J-R (2020) Sequential recommendation with self-attentive multi-adversarial network. pp 89–98
Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y (2015) Fitnets: Hints for thin deep nets. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
Sun F, Liu J, Wu J, Pei C, Lin X, Ou W, Jiang P (2019) Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. pp 1441–1450
Tang J, Wang K (2018) Ranking distillation: Learning compact ranking models with high performance for recommender system. pp 2289–2298
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the Thirty-first conference on neural information processing systems, Long Beach, CA, USA, 4-9 December 2017, pp 5998–6008
Wang R, Fu B, Fu G, Wang M (2017) Deep & cross network for ad click predictions. In: Proceedings of the ADKDD’17, Halifax, NS, Canada, August 13-17 2017, pp 12:1–12:7
Wang S, Hu L, Wang Y, Cao L, Sheng Q Z, Orgun M A (2019) Sequential recommender systems: Challenges, progress and prospects. pp 6332–6338
Wu C-Y, Ahmed A, Beutel A, Smola A J, Jing H (2017) Recurrent recommender networks. pp 495–503
Wu L, Li S, Hsieh C-J, Sharpnack J (2020) Sse-pt: Sequential recommendation via personalized transformer. pp 328–337
Xu C, Li Q, Ge J, Gao J, Yang X, Pei C, Sun F, Wu J, Sun H, Ou W (2020) Privileged features distillation at taobao recommendations. pp 2590–2598
Zhou G, Zhu X, Song C, Fan Y, Zhu H, Ma X, Yan Y, Jin J, Li H, Gai K (2018) Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, London, UK, August 19-23 2018, pp 1059–1068
Zhu J, Liu J, Li W, Lai J, He X, Chen L, Zheng Z (2020) Ensembled CTR prediction via knowledge distillation. In: d’Aquin M, Dietze S, Hauff C, Curry E, Cudré-Mauroux P (eds) CIKM ’20: The 29th ACM international conference on information and knowledge management, virtual event, ireland, october 19-23, 2020. ACM, pp 2941–2958

Download references

Funding

This work is supported by the National Key Research and Development Program of China (2018YFB1403501).

Author information

Authors and Affiliations

School of Computer Science, Beijing University of Posts & Telecommunications, Beijing, 100876, China
Ruo Huang, Meina Song, Haihong E & Zhonghong Ou
Transport Planning and Research Institute, Ministry of Transport, Beijing, 100028, China
Ruo Huang
Laboratory for Traffic & Transport Planning Digitalization, Beijing, 100028, China
Ruo Huang
Leavey School of Business, Santa Clara University, Santa Clara, CA, 95053, USA
Shelby McIntyre

Authors

Ruo Huang
View author publications
You can also search for this author inPubMed Google Scholar
Shelby McIntyre
View author publications
You can also search for this author inPubMed Google Scholar
Meina Song
View author publications
You can also search for this author inPubMed Google Scholar
Haihong E
View author publications
You can also search for this author inPubMed Google Scholar
Zhonghong Ou
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Haihong E.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Abbreviations Appendix: The following abbreviations are used in this paper:

Table 11 Abbreviations used in this paper

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, R., McIntyre, S., Song, M. et al. A knowledge distilled attention-based latent information extraction network for sequential user behavior. Multimed Tools Appl 82, 1017–1043 (2023). https://doi.org/10.1007/s11042-022-12513-y

Download citation

Received: 08 May 2021
Revised: 24 November 2021
Accepted: 25 January 2022
Published: 11 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-12513-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A knowledge distilled attention-based latent information extraction network for sequential user behavior

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Improving session-based recommendation with contrastive learning

Knowledge-enhanced personalized hierarchical attention network for sequential recommendation

GUESR: A Global Unsupervised Data-Enhancement with Bucket-Cluster Sampling for Sequential Recommendation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Abbreviations Appendix: The following abbreviations are used in this paper:

Abbreviations Appendix: The following abbreviations are used in this paper:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now