Abstract
Knowledge distillation (KD) is a successful method for transferring knowledge from one model (i.e., teacher model) to another model (i.e., student model). Despite the success of KD in classification tasks, applying KD to recommender models is challenging because of the sparsity of positive feedback, ambiguity of missing feedback, and ranking problem for top-N recommendation. In this paper, we propose a new KD model for collaborative filtering, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called teacher- and student-guided training methods, adaptively selecting the most beneficial feedback from the teacher model. Furthermore, we extend our model using self-distillation, called born-again CD (BACD). That is, the teacher and student models with the same model capacity are trained by using the proposed distillation method. The experimental results demonstrate that CD outperforms the state-of-the-art method by 2.7–33.2% and 2.7–29.9% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, BACD improves the teacher model by 3.5–12.0% and 4.9–13.3% in HR and NDCG, respectively.
Similar content being viewed by others
References
Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: improving imagenet classification through label progression. CoRR arxiv:1805.02641
Chen T, Goodfellow IJ, Shlens J (2016) Net2net: accelerating learning via knowledge transfer. In: International conference on learning representations (ICLR)
Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born-again neural networks. In: Proceedings of the 35th international conference on machine learning (ICML), pp 1602–1611
He X, Du X, Wang X, Tian F, Tang J, Chua T (2018) Outer product-based neural collaborative filtering. In: International joint conference on artificial intelligence (IJCAI), pp 2227–2233
He X, Liao L, Zhang H, Nie L, Hu X, Chua T (2017) Neural collaborative filtering. In: International conference on world wide Web (WWW), pp 173–182
Heo B, Kim J, Yun S, Park H, Kwak N, Choi JY (2019) A comprehensive overhaul of feature distillation. CoRR arxiv:1904.01866
Heo B, Lee M, Yun S, Choi JY (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: AAAI conference on artificial intelligence (AAAI), pp 3779–3787
Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR arxiv:1503.02531
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: IEEE international conference on data mining (ICDM), pp 263–272
Kim DH, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: ACM conference on recommender systems (RecSys), pp 233–240
Li C, Xu T, Zhu J, Zhang B (2017) Triple generative adversarial nets. In: Advances in neural information processing systems (NeurIPS), pp 4088–4098
Li Y, Hu J, Zhai C, Chen Y (2010) Improving one-class collaborative filtering by incorporating rich user information. In: ACM international conference on information and knowledge management, (CIKM), pp 959–968
Niu W, Caverlee J, Lu H (2018) Neural personalized ranking for image recommendation. In: ACM international conference on web search and data mining (WSDM), pp 423–431
Pan R, Scholz M (2009) Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering. In: ACM international conference on knowledge discovery and data mining (KDD), pp 667–676
Pan R, Zhou Y, Cao B, Liu NN, Lukose RM, Scholz M, Yang Q (2008) One-class collaborative filtering. In: IEEE international conference on data mining (ICDM), pp 502–511
Paquet U, Koenigstein N (2013) One-class collaborative filtering with random graphs. In: International world wide web conference (WWW), pp 999–1008
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: International conference on learning representations (ICLR)
Sedhain S, Menon AK, Sanner S, Xie L (2015) Autorec: autoencoders meet collaborative filtering. In: International conference on world wide web companion (WWW), pp 111–112
Sindhwani V, Bucak SS, Hu J, Mojsilovic A (2010) One-class matrix completion with low-density factorizations. In: IEEE international conference on data mining (ICDM), pp 1055–1060
Srinivas S, Fleuret F (2018) Knowledge transfer with jacobian matching. In: International conference on machine learning (ICML), pp 4730–4738
Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: ACM international conference on web search and data mining (WSDM), pp 565–573
Tang J, Wang K (2018) Ranking distillation: Learning compact ranking models with high performance for recommender system. In: ACM SIGKDD international conference on knowledge discovery & data mining (KDD), pp 2289–2298
Wang H, Wang N, Yeung D (2015) Collaborative deep learning for recommender systems. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1235–1244
Wang X, Zhang R, Sun Y, Qi J (2018) KDGAN: knowledge distillation with generative adversarial networks. In: Annual conference on neural information processing systems (NeurIPS), pp 783–794
Wang Y, Xu C, Xu C, Tao D (2018) Adversarial learning of portable student networks. In: AAAI conference on artificial intelligence (AAAI), pp 4260–4267
Wu Y, DuBois C, Zheng AX, Ester M (2016) Collaborative denoising auto-encoders for top-n recommender systems. In: ACM international conference on web search and data mining (WSDM), pp 153–162
Xu Z, Hsu Y, Huang J (2018) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. In: International conference on learning representations (ICLR)
Xue H, Dai X, Zhang J, Huang S, Chen J (2017) Deep matrix factorization models for recommender systems. In: International joint conference on artificial intelligence (IJCAI), pp 3203–3209
Yang C, Xie L, Qiao S, Yuille AL (2019) Training deep neural networks in generations: a more tolerant teacher educates better students. In: The thirty-third AAAI conference on artificial intelligence (AAAI), pp 5628–5635
Yao Y, Tong H, Yan G, Xu F, Zhang X, Szymanski BK, Lu J (2014) Dual-regularized one-class collaborative filtering. In: ACM international conference on conference on information and knowledge management (CIKM), pp 759–768
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7130–7138
Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: International conference on learning representations (ICLR)
Zheng X, Ding H, Mamitsuka H, Zhu S (2013) Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1025–1033
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) (NRF-2018R1A5A1060031 and NRF-2021R1F1A1063843). Also, this work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) funded by the Korea government (MSIT) (No. 2020-0-01821, ICT Creative Consilience Program).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, Jw., Choi, M., Sael, L. et al. Knowledge distillation meets recommendation: collaborative distillation for top-N recommendation. Knowl Inf Syst 64, 1323–1348 (2022). https://doi.org/10.1007/s10115-022-01667-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01667-8