skip to main content
10.1145/3539618.3592079acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Uncertainty-based Heterogeneous Privileged Knowledge Distillation for Recommendation System

Published:18 July 2023Publication History

ABSTRACT

In industrial recommendation systems, both data sizes and computational resources vary across different scenarios. For scenarios with limited data, data sparsity can lead to a decrease in model performance. Heterogeneous knowledge distillation-based transfer learning can be used to transfer knowledge from models in data-rich domains. However, in recommendation systems, the target domain possesses specific privileged features that significantly contribute to the model. While existing knowledge distillation methods have not taken these features into consideration, leading to suboptimal transfer weights. To overcome this limitation, we propose a novel algorithm called Uncertainty-based Heterogeneous Privileged Knowledge Distillation (UHPKD). Our method aims to quantify the knowledge of both the source and target domains, which represents the uncertainty of the models. This approach allows us to derive transfer weights based on the knowledge gain, which captures the difference in knowledge between the source and target domains. Experiments conducted on both public and industrial datasets demonstrate the superiority of our UHPKD algorithm compared to other state-of-the-art methods.

References

  1. Ke Ding, Yong He, Xin Dong, Jieyu Yang, Liang Zhang, Ang Li, Xiaolu Zhang, and Linjian Mo. 2022. GFlow-FT: Pick a Child Network via Gradient Flow for Efficient Fine-Tuning in Recommendation Systems. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3918--3922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bo Fu, Zhangjie Cao, Mingsheng Long, and Jianmin Wang. 2020. Learning to Detect Open Classes for Universal Domain Adaptation. In ECCV.Google ScholarGoogle Scholar
  3. Jianping Gou, B. Yu, Stephen J. Maybank, and Dacheng Tao. 2021. Knowledge Distillation: A Survey. ArXiv abs/2006.05525 (2021).Google ScholarGoogle Scholar
  4. Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google ScholarGoogle Scholar
  5. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network., 38--39 pages.Google ScholarGoogle Scholar
  6. Jian Hu, Hongya Tuo, Chao Wang, Lingfeng Qiao, Haowen Zhong, and Zhongliang Jing. 2019. Multi-Weight Partial Domain Adaptation. In BMVC.Google ScholarGoogle Scholar
  7. Jian Hu, Hongya Tuo, Chao Wang, Lingfeng Qiao, Haowen Zhong, Junchi Yan, Zhongliang Jing, and Henry Leung. 2020. Discriminative partial domain adversarial network. In ECCV. Springer, 632--648.Google ScholarGoogle Scholar
  8. Jian Hu, Haowen Zhong, Fei Yang, Shaogang Gong, Guile Wu, and Junchi Yan. 2022. Learning Unbiased Transferability for Domain Adaptation by Uncertainty Modeling. (2022), 223--241.Google ScholarGoogle Scholar
  9. Yunhun Jang, Hankook Lee, Sung Ju Hwang, and Jinwoo Shin. 2019. Learning What and Where to Transfer. In ICML.Google ScholarGoogle Scholar
  10. Taehyeon Kim, Jaehoon Oh, Nakyil Kim, Sangwook Cho, and Se-Young Yun. 2021. Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation. ArXiv abs/2105.08919 (2021).Google ScholarGoogle Scholar
  11. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google ScholarGoogle Scholar
  12. Seunghyun Lee, Dae Ha Kim, and Byung Cheol Song. 2018. Self-supervised Knowledge Distillation Using Singular Value Decomposition. In ECCV.Google ScholarGoogle Scholar
  13. Ang Li, Jian Hu, Chilin Fu, Xiaolu Zhang, and Jun Zhou. 2022. Attribute-Conditioned Face Swapping Network for Low-Resolution Images. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2305--2309.Google ScholarGoogle Scholar
  14. Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1754--1763.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nikolaos Passalis, Maria Tzelepi, and Anastasios Tefas. 2020. Heterogeneous Knowledge Distillation using Information Flow Modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Peng, X. Tao, Y. Wang, M. Pontil, and Y. Tian. 2016. Unsupervised Cross-Dataset Transfer Learning for Person Re-identification. In Computer Vision Pattern Recognition.Google ScholarGoogle Scholar
  17. A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. 2015. FitNets: Hints for Thin Deep Nets. Computer ence (2015).Google ScholarGoogle Scholar
  18. Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161--1170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15 (2014), 1929--1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention Is All You Need. arXiv (2017).Google ScholarGoogle Scholar
  21. Can Wang, Defang Chen, Jian-Ping Mei, Yuan Zhang, Yan Feng, and Chun Chen. 2022. SemCKD: Semantic Calibration for Cross-Layer Knowledge Distillation. IEEE Transactions on Knowledge and Data Engineering (2022).Google ScholarGoogle Scholar
  22. Z. Wang, Q. She, and J. Zhang. 2021. MaskNet: Introducing Feature-Wise Multiplication to CTR Ranking Models by Instance-Guided Mask. (2021).Google ScholarGoogle Scholar
  23. C. Xu, Q. Li, J. Ge, J. Gao, X. Yang, C. Pei, F. Sun, J. Wu, H. Sun, and W. Ou. 2020. Privileged Features Distillation at Taobao Recommendations. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.Google ScholarGoogle Scholar
  24. J. Yim, D. Joo, J. Bae, and J. Kim. 2017. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  25. Sergey Zagoruyko and Nikos Komodakis. 2017. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. In ICLR. https://arxiv.org/abs/1612.03928Google ScholarGoogle Scholar

Index Terms

  1. Uncertainty-based Heterogeneous Privileged Knowledge Distillation for Recommendation System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2023
      3567 pages
      ISBN:9781450394086
      DOI:10.1145/3539618

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%
    • Article Metrics

      • Downloads (Last 12 months)160
      • Downloads (Last 6 weeks)13

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader