skip to main content
10.1145/3539618.3591958acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction

Published:18 July 2023Publication History

ABSTRACT

Prediction models for click-through rate (CTR) learn feature interactions underlying user behaviors, which are crucial in recommendation systems. Due to their size and complexity, existing approaches have a limited range of applications. In order to decrease inference delay, knowledge distillation techniques have been used in recommendation systems. Due to the student model's lower capacity, the knowledge distillation process is less effective when there is a significant difference in the complexity of the network architecture between the teacher model and the student model.

We present a novel knowledge distillation approach called Bridge-based Knowledge Distillation (BKD), which employs a bridge model to facilitate the student model's learning from the teacher model's latent representations. The bridge model is based on Graph Neural Networks (GNNs), and leverages the edges of GNNs to identify significant feature interaction relationships, while simultaneously reducing redundancy for improved efficiency. To further enhance the efficiency of knowledge distillation, we decoupled the extracted knowledge and transferred each component separately to the student model, aiming to improve the distillation sufficiency of each module. Extensive experimental results show that our proposed BKD approach outperforms state-of-the-art competitors on various tasks.

References

  1. Avazu Inc. 2014. Avazu Click-Through Rate Prediction. https://www.kaggle.com/c/avazu-ctr-prediction/dataGoogle ScholarGoogle Scholar
  2. Ziwei Fan, Zhiwei Liu, Jiawei Zhang, Yun Xiong, Lei Zheng, and Philip S Yu. 2021. Continuous-time sequential recommendation with temporal graph collaborative transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 433--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google ScholarGoogle Scholar
  4. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, Vol. 30 (2017).Google ScholarGoogle Scholar
  5. Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, Vol. 2, 7 (2015).Google ScholarGoogle Scholar
  6. SeongKu Kang, Junyoung Hwang, Wonbin Kweon, and Hwanjo Yu. 2020. DE-RRD: A knowledge distillation framework for recommender system. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 605--614.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. SeongKu Kang, Junyoung Hwang, Wonbin Kweon, and Hwanjo Yu. 2021. Topology distillation for recommender system. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 829--839.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google ScholarGoogle Scholar
  9. Wonbin Kweon, SeongKu Kang, and Hwanjo Yu. 2021. Bidirectional distillation for top-K recommender system. In Proceedings of the Web Conference 2021. 3861--3871.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jae-woong Lee, Minjin Choi, Jongwuk Lee, and Hyunjung Shim. 2019. Collaborative distillation for top-N recommendation. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 369--378.Google ScholarGoogle Scholar
  11. Zekun Li, Zeyu Cui, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019. Fi-gnn: Modeling feature interactions via graph neural networks for ctr prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 539--548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1754--1763.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. 2019. Structured knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2604--2613.Google ScholarGoogle ScholarCross RefCross Ref
  14. Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3967--3976.Google ScholarGoogle ScholarCross RefCross Ref
  15. Nikolaos Passalis and Anastasios Tefas. 2018. Learning deep representations with probabilistic knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV). 268--284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hassan Ramchoun, Youssef Ghanou, Mohamed Ettaouil, and Mohammed Amine Janati Idrissi. 2016. Multilayer perceptron: Architecture optimization and training. (2016).Google ScholarGoogle Scholar
  17. Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995--1000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).Google ScholarGoogle Scholar
  19. Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks, Vol. 20, 1 (2008), 61--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161--1170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yixin Su, Yunxiang Zhao, Sarah Erfani, Junhao Gan, and Rui Zhang. 2022. Detecting Arbitrary Order Beneficial Feature Interactions for Recommender Systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1676--1686.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2289--2298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019).Google ScholarGoogle Scholar
  24. Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD'17. 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ruoxi Wang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the Web Conference 2021. 1785--1797.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165--174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4133--4141.Google ScholarGoogle ScholarCross RefCross Ref
  28. Sergey Zagoruyko and Nikos Komodakis. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016).Google ScholarGoogle Scholar
  29. Jieming Zhu, Jinyang Liu, Weiqi Li, Jincai Lai, Xiuqiang He, Liang Chen, and Zibin Zheng. 2020. Ensembled CTR prediction via knowledge distillation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2941--2958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. The Book-Crossing Dataset. In Proceedings of the 2005 ACM Conference on Recommender Systems. http://www2.informatik.uni-freiburg.de/ cziegler/BX/Google ScholarGoogle Scholar

Index Terms

  1. BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2023
      3567 pages
      ISBN:9781450394086
      DOI:10.1145/3539618

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%
    • Article Metrics

      • Downloads (Last 12 months)125
      • Downloads (Last 6 weeks)7

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader