ABSTRACT
In industrial recommendation systems, both data sizes and computational resources vary across different scenarios. For scenarios with limited data, data sparsity can lead to a decrease in model performance. Heterogeneous knowledge distillation-based transfer learning can be used to transfer knowledge from models in data-rich domains. However, in recommendation systems, the target domain possesses specific privileged features that significantly contribute to the model. While existing knowledge distillation methods have not taken these features into consideration, leading to suboptimal transfer weights. To overcome this limitation, we propose a novel algorithm called Uncertainty-based Heterogeneous Privileged Knowledge Distillation (UHPKD). Our method aims to quantify the knowledge of both the source and target domains, which represents the uncertainty of the models. This approach allows us to derive transfer weights based on the knowledge gain, which captures the difference in knowledge between the source and target domains. Experiments conducted on both public and industrial datasets demonstrate the superiority of our UHPKD algorithm compared to other state-of-the-art methods.
- Ke Ding, Yong He, Xin Dong, Jieyu Yang, Liang Zhang, Ang Li, Xiaolu Zhang, and Linjian Mo. 2022. GFlow-FT: Pick a Child Network via Gradient Flow for Efficient Fine-Tuning in Recommendation Systems. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3918--3922.Google ScholarDigital Library
- Bo Fu, Zhangjie Cao, Mingsheng Long, and Jianmin Wang. 2020. Learning to Detect Open Classes for Universal Domain Adaptation. In ECCV.Google Scholar
- Jianping Gou, B. Yu, Stephen J. Maybank, and Dacheng Tao. 2021. Knowledge Distillation: A Survey. ArXiv abs/2006.05525 (2021).Google Scholar
- Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network., 38--39 pages.Google Scholar
- Jian Hu, Hongya Tuo, Chao Wang, Lingfeng Qiao, Haowen Zhong, and Zhongliang Jing. 2019. Multi-Weight Partial Domain Adaptation. In BMVC.Google Scholar
- Jian Hu, Hongya Tuo, Chao Wang, Lingfeng Qiao, Haowen Zhong, Junchi Yan, Zhongliang Jing, and Henry Leung. 2020. Discriminative partial domain adversarial network. In ECCV. Springer, 632--648.Google Scholar
- Jian Hu, Haowen Zhong, Fei Yang, Shaogang Gong, Guile Wu, and Junchi Yan. 2022. Learning Unbiased Transferability for Domain Adaptation by Uncertainty Modeling. (2022), 223--241.Google Scholar
- Yunhun Jang, Hankook Lee, Sung Ju Hwang, and Jinwoo Shin. 2019. Learning What and Where to Transfer. In ICML.Google Scholar
- Taehyeon Kim, Jaehoon Oh, Nakyil Kim, Sangwook Cho, and Se-Young Yun. 2021. Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation. ArXiv abs/2105.08919 (2021).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google Scholar
- Seunghyun Lee, Dae Ha Kim, and Byung Cheol Song. 2018. Self-supervised Knowledge Distillation Using Singular Value Decomposition. In ECCV.Google Scholar
- Ang Li, Jian Hu, Chilin Fu, Xiaolu Zhang, and Jun Zhou. 2022. Attribute-Conditioned Face Swapping Network for Low-Resolution Images. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2305--2309.Google Scholar
- Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1754--1763.Google ScholarDigital Library
- Nikolaos Passalis, Maria Tzelepi, and Anastasios Tefas. 2020. Heterogeneous Knowledge Distillation using Information Flow Modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- P. Peng, X. Tao, Y. Wang, M. Pontil, and Y. Tian. 2016. Unsupervised Cross-Dataset Transfer Learning for Person Re-identification. In Computer Vision Pattern Recognition.Google Scholar
- A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. 2015. FitNets: Hints for Thin Deep Nets. Computer ence (2015).Google Scholar
- Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161--1170.Google ScholarDigital Library
- Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (2014), 1929--1958.Google ScholarDigital Library
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention Is All You Need. arXiv (2017).Google Scholar
- Can Wang, Defang Chen, Jian-Ping Mei, Yuan Zhang, Yan Feng, and Chun Chen. 2022. SemCKD: Semantic Calibration for Cross-Layer Knowledge Distillation. IEEE Transactions on Knowledge and Data Engineering (2022).Google Scholar
- Z. Wang, Q. She, and J. Zhang. 2021. MaskNet: Introducing Feature-Wise Multiplication to CTR Ranking Models by Instance-Guided Mask. (2021).Google Scholar
- C. Xu, Q. Li, J. Ge, J. Gao, X. Yang, C. Pei, F. Sun, J. Wu, H. Sun, and W. Ou. 2020. Privileged Features Distillation at Taobao Recommendations. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.Google Scholar
- J. Yim, D. Joo, J. Bae, and J. Kim. 2017. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Sergey Zagoruyko and Nikos Komodakis. 2017. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In ICLR. https://arxiv.org/abs/1612.03928Google Scholar
Index Terms
- Uncertainty-based Heterogeneous Privileged Knowledge Distillation for Recommendation System
Recommendations
Parallel Ratio Based CF for Recommendation System
ICCCNT '16: Proceedings of the 7th International Conference on Computing Communication and Networking TechnologiesWith the increase in E-commerce, Recommendation Systems are getting popular to provide recommendations of various items (movies, books, music) to users. To build the Recommendation System (RS), Collaborative Filtering (CF) techniques are proven ...
Research Summary of Recommendation System Based on Knowledge Graph
BDE '21: Proceedings of the 2021 3rd International Conference on Big Data EngineeringThis article will systematically review the knowledge graph-based recommendation system from three aspects: knowledge graph, recommendation system, and application of knowledge graph and recommendation system. Specifically, it investigates and analyzes ...
Personalized recommendation system based on knowledge embedding and historical behavior
AbstractCollaborative filtering (CF) usually suffers from limited performance in recommendation systems due to the sparsity of user–item interactions and cold start problems. To address these issues, auxiliary information from knowledge graphs, such as ...
Comments