skip to main content
10.1145/3604915.3608769acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Gradient Matching for Categorical Data Distillation in CTR Prediction

Published:14 September 2023Publication History

ABSTRACT

The cost of hardware and energy consumption on training a click-through rate (CTR) model is highly prohibitive. A recent promising direction for reducing such costs is data distillation with gradient matching, which aims to synthesize a small distilled dataset to guide the model to a similar parameter space as those trained on real data. However, there are two main challenges to implementing such a method in the recommendation field: (1) The categorical recommended data are high dimensional and sparse one- or multi-hot data which will block the gradient flow, causing backpropagation-based data distillation invalid. (2) The data distillation process with gradient matching is computationally expensive due to the bi-level optimization. To this end, we investigate efficient data distillation tailored for recommendation data with plenty of side information where we formulate the discrete data to the dense and continuous data format. Then, we further introduce a one-step gradient matching scheme, which performs gradient matching for only a single step to overcome the inefficient training process. The overall proposed method is called Categorical data distillation with Gradient Matching (CGM), which is capable of distilling a large dataset into a small of informative synthetic data for training CTR models from scratch. Experimental results show that our proposed method not only outperforms the state-of-the-art coreset selection and data distillation methods but also has remarkable cross-architecture performance. Moreover, we explore the application of CGM on model retraining and mitigate the effect of different random seeds on the training results.

References

  1. George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. 2022. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4750–4759.Google ScholarGoogle Scholar
  2. Suming J Chen, Zhen Qin, Zac Wilson, Brian Calaci, Michael Rose, Ryan Evans, Sean Abraham, Donald Metzler, Sandeep Tata, and Michael Colagrosso. 2020. Improving recommendation quality in google drive. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2900–2908.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jang Hyun Cho and Bharath Hariharan. 2019. On the efficacy of knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision. 4794–4802.Google ScholarGoogle ScholarCross RefCross Ref
  5. Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, 2010. The YouTube video recommendation system. In Proceedings of the fourth ACM conference on Recommender systems. 293–296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tian Dong, Bo Zhao, and Lingjuan Lyu. 2022. Privacy for free: How does dataset condensation help privacy?. In International Conference on Machine Learning. PMLR, 5378–5396.Google ScholarGoogle Scholar
  8. Dan Feldman, Melanie Schmidt, and Christian Sohler. 2020. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. SIAM J. Comput. 49, 3 (2020), 601–657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chengcheng Guo, Bo Zhao, and Yanbing Bai. 2022. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications. Springer, 181–195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 1725–1731.Google ScholarGoogle ScholarCross RefCross Ref
  11. Zhexue Huang. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery 2, 3 (1998), 283–304.Google ScholarGoogle Scholar
  12. Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical Reparameterization with Gumbel-Softmax. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  13. Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. 2022. Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 720–730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, and Sungroh Yoon. 2022. Dataset condensation with contrastive signals. In International Conference on Machine Learning. PMLR, 12352–12364.Google ScholarGoogle Scholar
  16. Zeyu Li, Wei Cheng, Yang Chen, Haifeng Chen, and Wei Wang. 2020. Interpretable click-through rate prediction through hierarchical attention. In Proceedings of the 13th International Conference on Web Search and Data Mining. 313–321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yang Liu, Cheng Lyu, Zhiyuan Liu, and Dacheng Tao. 2019. Building effective short video recommendation. In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 651–656.Google ScholarGoogle ScholarCross RefCross Ref
  18. Andrew L Maas, Awni Y Hannun, Andrew Y Ng, 2013. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, Vol. 30. Citeseer, 3.Google ScholarGoogle Scholar
  19. Rui Maia and Joao C Ferreira. 2018. Context-aware food recommendation system. Context-aware food recommendation system (2018), 349–356.Google ScholarGoogle Scholar
  20. Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. 2020. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning. PMLR, 6950–6960.Google ScholarGoogle Scholar
  21. Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. 2021. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems 34 (2021), 20596–20607.Google ScholarGoogle Scholar
  22. Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001–2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995–1000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Noveen Sachdeva, Mehak Dhaliwal, Carole-Jean Wu, and Julian McAuley. 2022. Infinite recommendation networks: A data-centric approach. Advances in Neural Information Processing Systems 35 (2022), 31292–31305.Google ScholarGoogle Scholar
  25. Noveen Sachdeva, Carole-Jean Wu, and Julian McAuley. 2022. On Sampling Collaborative Filtering Datasets. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 842–850.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  27. Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos. 2022. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems 35 (2022), 19523–19536.Google ScholarGoogle Scholar
  28. Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon. 2018. An Empirical Study of Example Forgetting during Deep Neural Network Learning. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  29. Qinyong Wang, Hongzhi Yin, Hao Wang, Quoc Viet Hung Nguyen, Zi Huang, and Lizhen Cui. 2019. Enhancing collaborative filtering with generative augmentation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 548–556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation. arXiv preprint arXiv:1811.10959 (2018).Google ScholarGoogle Scholar
  32. Yaozheng Wang, Dawei Feng, Dongsheng Li, Xinyuan Chen, Yunxiang Zhao, and Xin Niu. 2016. A mobile recommendation system based on logistic regression and gradient boosting decision trees. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 1896–1902.Google ScholarGoogle ScholarCross RefCross Ref
  33. Max Welling. 2009. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning. 1121–1128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ji Yang, Xinyang Yi, Derek Zhiyuan Cheng, Lichan Hong, Yang Li, Simon Xiaoming Wang, Taibai Xu, and Ed H Chi. 2020. Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion Proceedings of the Web Conference 2020. 441–447.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jianyu Zhang and Françoise Fogelman-Soulié. 2018. KKbox’s music recommendation challenge solution with feature engineering. In 11th ACM International Conference on Web Search and Data Mining WSDM.Google ScholarGoogle Scholar
  36. Yang Zhang, Fuli Feng, Chenxu Wang, Xiangnan He, Meng Wang, Yan Li, and Yongdong Zhang. 2020. How to retrain recommender system? A sequential meta-learning method. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1479–1488.Google ScholarGoogle Scholar
  37. Bo Zhao and Hakan Bilen. 2021. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning. PMLR, 12674–12685.Google ScholarGoogle Scholar
  38. Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 6514–6523.Google ScholarGoogle ScholarCross RefCross Ref
  39. Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2021. Dataset Condensation with Gradient Matching.ICLR 1, 2 (2021), 3.Google ScholarGoogle Scholar
  40. Weijie Zhao, Jingyuan Zhang, Deping Xie, Yulei Qian, Ronglai Jia, and Ping Li. 2019. Aibox: Ctr prediction model training on a single node. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 319–328.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059–1068.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jieming Zhu, Jinyang Liu, Shuai Yang, Qi Zhang, and Xiuqiang He. 2021. Open Benchmarking for Click-Through Rate Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2759–2769.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep leakage from gradients. Advances in neural information processing systems 32 (2019).Google ScholarGoogle Scholar

Index Terms

  1. Gradient Matching for Categorical Data Distillation in CTR Prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems
        September 2023
        1406 pages

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 September 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate254of1,295submissions,20%

        Upcoming Conference

        RecSys '24
        18th ACM Conference on Recommender Systems
        October 14 - 18, 2024
        Bari , Italy
      • Article Metrics

        • Downloads (Last 12 months)416
        • Downloads (Last 6 weeks)40

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format