Skip to main content
Log in

Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Knowledge distillation (KD) is a successful method for transferring knowledge from one model (i.e., teacher model) to another model (i.e., student model). Despite the success of KD in classification tasks, applying KD to recommender models is challenging because of the sparsity of positive feedback, ambiguity of missing feedback, and ranking problem for top-N recommendation. In this paper, we propose a new KD model for collaborative filtering, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called teacher- and student-guided training methods, adaptively selecting the most beneficial feedback from the teacher model. Furthermore, we extend our model using self-distillation, called born-again CD (BACD). That is, the teacher and student models with the same model capacity are trained by using the proposed distillation method. The experimental results demonstrate that CD outperforms the state-of-the-art method by 2.7–33.2% and 2.7–29.9% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, BACD improves the teacher model by 3.5–12.0% and 4.9–13.3% in HR and NDCG, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://jmcauley.ucsd.edu/data/amazon/.

  2. https://grouplens.org/datasets/movielens/.

  3. https://github.com/hexiangnan/sigir16-eals.

  4. http://dawenl.github.io/data/gowalla_pro.zip.

  5. https://github.com/graytowne/rank_distill.

  6. https://github.com/graytowne/caser_pytorch.

References

  1. Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: improving imagenet classification through label progression. CoRR arxiv:1805.02641

  2. Chen T, Goodfellow IJ, Shlens J (2016) Net2net: accelerating learning via knowledge transfer. In: International conference on learning representations (ICLR)

  3. Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born-again neural networks. In: Proceedings of the 35th international conference on machine learning (ICML), pp 1602–1611

  4. He X, Du X, Wang X, Tian F, Tang J, Chua T (2018) Outer product-based neural collaborative filtering. In: International joint conference on artificial intelligence (IJCAI), pp 2227–2233

  5. He X, Liao L, Zhang H, Nie L, Hu X, Chua T (2017) Neural collaborative filtering. In: International conference on world wide Web (WWW), pp 173–182

  6. Heo B, Kim J, Yun S, Park H, Kwak N, Choi JY (2019) A comprehensive overhaul of feature distillation. CoRR arxiv:1904.01866

  7. Heo B, Lee M, Yun S, Choi JY (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: AAAI conference on artificial intelligence (AAAI), pp 3779–3787

  8. Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR arxiv:1503.02531

  9. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: IEEE international conference on data mining (ICDM), pp 263–272

  10. Kim DH, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: ACM conference on recommender systems (RecSys), pp 233–240

  11. Li C, Xu T, Zhu J, Zhang B (2017) Triple generative adversarial nets. In: Advances in neural information processing systems (NeurIPS), pp 4088–4098

  12. Li Y, Hu J, Zhai C, Chen Y (2010) Improving one-class collaborative filtering by incorporating rich user information. In: ACM international conference on information and knowledge management, (CIKM), pp 959–968

  13. Niu W, Caverlee J, Lu H (2018) Neural personalized ranking for image recommendation. In: ACM international conference on web search and data mining (WSDM), pp 423–431

  14. Pan R, Scholz M (2009) Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering. In: ACM international conference on knowledge discovery and data mining (KDD), pp 667–676

  15. Pan R, Zhou Y, Cao B, Liu NN, Lukose RM, Scholz M, Yang Q (2008) One-class collaborative filtering. In: IEEE international conference on data mining (ICDM), pp 502–511

  16. Paquet U, Koenigstein N (2013) One-class collaborative filtering with random graphs. In: International world wide web conference (WWW), pp 999–1008

  17. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: International conference on learning representations (ICLR)

  18. Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: autoencoders meet collaborative filtering. In: International conference on world wide web companion (WWW), pp 111–112

  19. Sindhwani V, Bucak SS, Hu J, Mojsilovic A (2010) One-class matrix completion with low-density factorizations. In: IEEE international conference on data mining (ICDM), pp 1055–1060

  20. Srinivas S, Fleuret F (2018) Knowledge transfer with jacobian matching. In: International conference on machine learning (ICML), pp 4730–4738

  21. Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: ACM international conference on web search and data mining (WSDM), pp 565–573

  22. Tang J, Wang K (2018) Ranking distillation: Learning compact ranking models with high performance for recommender system. In: ACM SIGKDD international conference on knowledge discovery & data mining (KDD), pp 2289–2298

  23. Wang H, Wang N, Yeung D (2015) Collaborative deep learning for recommender systems. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1235–1244

  24. Wang X, Zhang R, Sun Y, Qi J (2018) KDGAN: knowledge distillation with generative adversarial networks. In: Annual conference on neural information processing systems (NeurIPS), pp 783–794

  25. Wang Y, Xu C, Xu C, Tao D (2018) Adversarial learning of portable student networks. In: AAAI conference on artificial intelligence (AAAI), pp 4260–4267

  26. Wu Y, DuBois C, Zheng AX, Ester M (2016) Collaborative denoising auto-encoders for top-n recommender systems. In: ACM international conference on web search and data mining (WSDM), pp 153–162

  27. Xu Z, Hsu Y, Huang J (2018) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. In: International conference on learning representations (ICLR)

  28. Xue H, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. In: International joint conference on artificial intelligence (IJCAI), pp 3203–3209

  29. Yang C, Xie L, Qiao S, Yuille AL (2019) Training deep neural networks in generations: a more tolerant teacher educates better students. In: The thirty-third AAAI conference on artificial intelligence (AAAI), pp 5628–5635

  30. Yao Y, Tong H, Yan G, Xu F, Zhang X, Szymanski BK, Lu J (2014) Dual-regularized one-class collaborative filtering. In: ACM international conference on conference on information and knowledge management (CIKM), pp 759–768

  31. Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7130–7138

  32. Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International conference on learning representations (ICLR)

  33. Zheng X, Ding H, Mamitsuka H, Zhu S (2013) Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1025–1033

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) (NRF-2018R1A5A1060031 and NRF-2021R1F1A1063843). Also, this work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) funded by the Korea government (MSIT) (No. 2020-0-01821, ICT Creative Consilience Program).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jongwuk Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, Jw., Choi, M., Sael, L. et al. Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation. Knowl Inf Syst 64, 1323–1348 (2022). https://doi.org/10.1007/s10115-022-01667-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01667-8

Keywords

Navigation