Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation

Lee, Jae-woong; Choi, Minjin; Sael, Lee; Shim, Hyunjung; Lee, Jongwuk

doi:10.1007/s10115-022-01667-8

Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation

Regular paper
Published: 20 April 2022

Volume 64, pages 1323–1348, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jae-woong Lee¹,
Minjin Choi¹,
Lee Sael²,
Hyunjung Shim³ &
…
Jongwuk Lee¹

659 Accesses
Explore all metrics

Abstract

Knowledge distillation (KD) is a successful method for transferring knowledge from one model (i.e., teacher model) to another model (i.e., student model). Despite the success of KD in classification tasks, applying KD to recommender models is challenging because of the sparsity of positive feedback, ambiguity of missing feedback, and ranking problem for top-N recommendation. In this paper, we propose a new KD model for collaborative filtering, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called teacher- and student-guided training methods, adaptively selecting the most beneficial feedback from the teacher model. Furthermore, we extend our model using self-distillation, called born-again CD (BACD). That is, the teacher and student models with the same model capacity are trained by using the proposed distillation method. The experimental results demonstrate that CD outperforms the state-of-the-art method by 2.7–33.2% and 2.7–29.9% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, BACD improves the teacher model by 3.5–12.0% and 4.9–13.3% in HR and NDCG, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aggregate Distillation for Top-K Recommender System

Disentangled Contrastive Learning for Cross-Domain Recommendation

Knowledge-Enhanced Collaborative Meta Learner for Long-Tail Recommendation

Notes

References

Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: improving imagenet classification through label progression. CoRR arxiv:1805.02641
Chen T, Goodfellow IJ, Shlens J (2016) Net2net: accelerating learning via knowledge transfer. In: International conference on learning representations (ICLR)
Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born-again neural networks. In: Proceedings of the 35th international conference on machine learning (ICML), pp 1602–1611
He X, Du X, Wang X, Tian F, Tang J, Chua T (2018) Outer product-based neural collaborative filtering. In: International joint conference on artificial intelligence (IJCAI), pp 2227–2233
He X, Liao L, Zhang H, Nie L, Hu X, Chua T (2017) Neural collaborative filtering. In: International conference on world wide Web (WWW), pp 173–182
Heo B, Kim J, Yun S, Park H, Kwak N, Choi JY (2019) A comprehensive overhaul of feature distillation. CoRR arxiv:1904.01866
Heo B, Lee M, Yun S, Choi JY (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: AAAI conference on artificial intelligence (AAAI), pp 3779–3787
Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR arxiv:1503.02531
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: IEEE international conference on data mining (ICDM), pp 263–272
Kim DH, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: ACM conference on recommender systems (RecSys), pp 233–240
Li C, Xu T, Zhu J, Zhang B (2017) Triple generative adversarial nets. In: Advances in neural information processing systems (NeurIPS), pp 4088–4098
Li Y, Hu J, Zhai C, Chen Y (2010) Improving one-class collaborative filtering by incorporating rich user information. In: ACM international conference on information and knowledge management, (CIKM), pp 959–968
Niu W, Caverlee J, Lu H (2018) Neural personalized ranking for image recommendation. In: ACM international conference on web search and data mining (WSDM), pp 423–431
Pan R, Scholz M (2009) Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering. In: ACM international conference on knowledge discovery and data mining (KDD), pp 667–676
Pan R, Zhou Y, Cao B, Liu NN, Lukose RM, Scholz M, Yang Q (2008) One-class collaborative filtering. In: IEEE international conference on data mining (ICDM), pp 502–511
Paquet U, Koenigstein N (2013) One-class collaborative filtering with random graphs. In: International world wide web conference (WWW), pp 999–1008
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: International conference on learning representations (ICLR)
Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: autoencoders meet collaborative filtering. In: International conference on world wide web companion (WWW), pp 111–112
Sindhwani V, Bucak SS, Hu J, Mojsilovic A (2010) One-class matrix completion with low-density factorizations. In: IEEE international conference on data mining (ICDM), pp 1055–1060
Srinivas S, Fleuret F (2018) Knowledge transfer with jacobian matching. In: International conference on machine learning (ICML), pp 4730–4738
Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: ACM international conference on web search and data mining (WSDM), pp 565–573
Tang J, Wang K (2018) Ranking distillation: Learning compact ranking models with high performance for recommender system. In: ACM SIGKDD international conference on knowledge discovery & data mining (KDD), pp 2289–2298
Wang H, Wang N, Yeung D (2015) Collaborative deep learning for recommender systems. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1235–1244
Wang X, Zhang R, Sun Y, Qi J (2018) KDGAN: knowledge distillation with generative adversarial networks. In: Annual conference on neural information processing systems (NeurIPS), pp 783–794
Wang Y, Xu C, Xu C, Tao D (2018) Adversarial learning of portable student networks. In: AAAI conference on artificial intelligence (AAAI), pp 4260–4267
Wu Y, DuBois C, Zheng AX, Ester M (2016) Collaborative denoising auto-encoders for top-n recommender systems. In: ACM international conference on web search and data mining (WSDM), pp 153–162
Xu Z, Hsu Y, Huang J (2018) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. In: International conference on learning representations (ICLR)
Xue H, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. In: International joint conference on artificial intelligence (IJCAI), pp 3203–3209
Yang C, Xie L, Qiao S, Yuille AL (2019) Training deep neural networks in generations: a more tolerant teacher educates better students. In: The thirty-third AAAI conference on artificial intelligence (AAAI), pp 5628–5635
Yao Y, Tong H, Yan G, Xu F, Zhang X, Szymanski BK, Lu J (2014) Dual-regularized one-class collaborative filtering. In: ACM international conference on conference on information and knowledge management (CIKM), pp 759–768
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7130–7138
Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International conference on learning representations (ICLR)
Zheng X, Ding H, Mamitsuka H, Zhu S (2013) Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1025–1033

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) (NRF-2018R1A5A1060031 and NRF-2021R1F1A1063843). Also, this work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) funded by the Korea government (MSIT) (No. 2020-0-01821, ICT Creative Consilience Program).

Author information

Authors and Affiliations

Sungkyunkwan University, Seoul, Republic of Korea
Jae-woong Lee, Minjin Choi & Jongwuk Lee
Ajou University, Seoul, Republic of Korea
Lee Sael
Yonsei University, Seoul, Republic of Korea
Hyunjung Shim

Authors

Jae-woong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Minjin Choi
View author publications
You can also search for this author in PubMed Google Scholar
Lee Sael
View author publications
You can also search for this author in PubMed Google Scholar
Hyunjung Shim
View author publications
You can also search for this author in PubMed Google Scholar
Jongwuk Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jongwuk Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, Jw., Choi, M., Sael, L. et al. Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation. Knowl Inf Syst 64, 1323–1348 (2022). https://doi.org/10.1007/s10115-022-01667-8

Download citation

Received: 28 January 2020
Revised: 18 February 2022
Accepted: 18 February 2022
Published: 20 April 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10115-022-01667-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation

Abstract

Access this article

Similar content being viewed by others

Aggregate Distillation for Top-K Recommender System

Disentangled Contrastive Learning for Cross-Domain Recommendation

Knowledge-Enhanced Collaborative Meta Learner for Long-Tail Recommendation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation

Abstract

Access this article

Similar content being viewed by others

Aggregate Distillation for Top-K Recommender System

Disentangled Contrastive Learning for Cross-Domain Recommendation

Knowledge-Enhanced Collaborative Meta Learner for Long-Tail Recommendation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation