short-paper

BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction

Authors:
Yin Deng

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China

0009-0001-8590-5049
View Profile

,
Yingxin Chen

Ant Group, Shanghai, China

Ant Group, Shanghai, China

0009-0009-2468-480X
View Profile

,
Xin Dong

Ant Group, Shanghai, China

Ant Group, Shanghai, China

0009-0004-2523-9971
View Profile

,
Lingchao Pan

Ant Group, Shanghai, China

Ant Group, Shanghai, China

0009-0003-6251-6977
View Profile

,
Hai Li

Ant Group, Shanghai, China

Ant Group, Shanghai, China

0009-0001-5055-5341
View Profile

,
Lei Cheng

Ant Group, Hangzhou, China

Ant Group, Hangzhou, China

0009-0002-2186-699X
View Profile

,
Linjian Mo

Ant Group, Shanghai, China

Ant Group, Shanghai, China

0000-0002-6682-1448
View Profile

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2023Pages 1859–1863https://doi.org/10.1145/3539618.3591958

Published:18 July 2023Publication History

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1859–1863

ABSTRACT

Prediction models for click-through rate (CTR) learn feature interactions underlying user behaviors, which are crucial in recommendation systems. Due to their size and complexity, existing approaches have a limited range of applications. In order to decrease inference delay, knowledge distillation techniques have been used in recommendation systems. Due to the student model's lower capacity, the knowledge distillation process is less effective when there is a significant difference in the complexity of the network architecture between the teacher model and the student model.

We present a novel knowledge distillation approach called Bridge-based Knowledge Distillation (BKD), which employs a bridge model to facilitate the student model's learning from the teacher model's latent representations. The bridge model is based on Graph Neural Networks (GNNs), and leverages the edges of GNNs to identify significant feature interaction relationships, while simultaneously reducing redundancy for improved efficiency. To further enhance the efficiency of knowledge distillation, we decoupled the extracted knowledge and transferred each component separately to the student model, aiming to improve the distillation sufficiency of each module. Extensive experimental results show that our proposed BKD approach outperforms state-of-the-art competitors on various tasks.

References

Avazu Inc. 2014. Avazu Click-Through Rate Prediction. https://www.kaggle.com/c/avazu-ctr-prediction/dataGoogle Scholar
Ziwei Fan, Zhiwei Liu, Jiawei Zhang, Yun Xiong, Lei Zheng, and Philip S Yu. 2021. Continuous-time sequential recommendation with temporal graph collaborative transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 433--442.Google ScholarDigital Library
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google Scholar
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, Vol. 2, 7 (2015).Google Scholar
SeongKu Kang, Junyoung Hwang, Wonbin Kweon, and Hwanjo Yu. 2020. DE-RRD: A knowledge distillation framework for recommender system. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 605--614.Google ScholarDigital Library
SeongKu Kang, Junyoung Hwang, Wonbin Kweon, and Hwanjo Yu. 2021. Topology distillation for recommender system. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 829--839.Google ScholarDigital Library
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
Wonbin Kweon, SeongKu Kang, and Hwanjo Yu. 2021. Bidirectional distillation for top-K recommender system. In Proceedings of the Web Conference 2021. 3861--3871.Google ScholarDigital Library
Jae-woong Lee, Minjin Choi, Jongwuk Lee, and Hyunjung Shim. 2019. Collaborative distillation for top-N recommendation. In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 369--378.Google Scholar
Zekun Li, Zeyu Cui, Shu Wu, Xiaoyu Zhang, and Liang Wang. 2019. Fi-gnn: Modeling feature interactions via graph neural networks for ctr prediction. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 539--548.Google ScholarDigital Library
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1754--1763.Google ScholarDigital Library
Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. 2019. Structured knowledge distillation for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2604--2613.Google ScholarCross Ref
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3967--3976.Google ScholarCross Ref
Nikolaos Passalis and Anastasios Tefas. 2018. Learning deep representations with probabilistic knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV). 268--284.Google ScholarDigital Library
Hassan Ramchoun, Youssef Ghanou, Mohamed Ettaouil, and Mohammed Amine Janati Idrissi. 2016. Multilayer perceptron: Architecture optimization and training. (2016).Google Scholar
Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995--1000.Google ScholarDigital Library
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).Google Scholar
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks, Vol. 20, 1 (2008), 61--80.Google ScholarDigital Library
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1161--1170.Google ScholarDigital Library
Yixin Su, Yunxiang Zhao, Sarah Erfani, Junhao Gan, and Rui Zhang. 2022. Detecting Arbitrary Order Beneficial Feature Interactions for Recommender Systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1676--1686.Google ScholarDigital Library
Jiaxi Tang and Ke Wang. 2018. Ranking distillation: Learning compact ranking models with high performance for recommender system. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2289--2298.Google ScholarDigital Library
Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019).Google Scholar
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD'17. 1--7.Google ScholarDigital Library
Ruoxi Wang, Rakesh Shivanna, Derek Cheng, Sagar Jain, Dong Lin, Lichan Hong, and Ed Chi. 2021. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the Web Conference 2021. 1785--1797.Google ScholarDigital Library
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval. 165--174.Google ScholarDigital Library
Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4133--4141.Google ScholarCross Ref
Sergey Zagoruyko and Nikos Komodakis. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016).Google Scholar
Jieming Zhu, Jinyang Liu, Weiqi Li, Jincai Lai, Xiuqiang He, Liang Chen, and Zibin Zheng. 2020. Ensembled CTR prediction via knowledge distillation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2941--2958.Google ScholarDigital Library
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. 2005. The Book-Crossing Dataset. In Proceedings of the 2005 ACM Conference on Recommender Systems. http://www2.informatik.uni-freiburg.de/ cziegler/BX/Google Scholar

Index Terms

BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval efficiency

Recommendations

Multi-target Knowledge Distillation via Student Self-reflection
Abstract
Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge ...
Read More
DE-RRD: A Knowledge Distillation Framework for Recommender System
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Recent recommender systems have started to employ knowledge distillation, which is a model compression technique distilling knowledge from a cumbersome model (teacher) to a compact model (student), to reduce inference latency while maintaining ...
Read More
Multi-stage knowledge distillation for sequential recommendation with interest knowledge
Abstract
Sequential models based on deep learning are widely used in sequential recommendation task, but the increase of model parameters results in a higher latency in the inference stage, which limits the real-time performance of the model. In order to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CTR prediction
knowledge distillation
model compression
recommender system
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 125
  Total Downloads
- Downloads (Last 12 months)125
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BKD: A Bridge-based Knowledge Distillation Method for Click-Through Rate Prediction

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-target Knowledge Distillation via Student Self-reflection

DE-RRD: A Knowledge Distillation Framework for Recommender System

Multi-stage knowledge distillation for sequential recommendation with interest knowledge