research-article

Task-distribution-aware Meta-learning for Cold-start CTR Prediction

Authors:
Tianwei Cao

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Qianqian Xu

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, CAS, Beijing, China

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, CAS, Beijing, China
View Profile

,
Zhiyong Yang

State Key Lab. of Information Security, Institute of Information Engineering, CAS; School of Cyber Security, UCAS, Beijing, China

State Key Lab. of Information Security, Institute of Information Engineering, CAS; School of Cyber Security, UCAS, Beijing, China
View Profile

,
Qingming Huang

Key Lab. of IIP, Inst. of Comput. Tech., CAS; Sch. of Computer Sci. and Tech., UCAS; Key Lab. of BDKM, CAS; Peng Cheng Lab., Beijing, China

Key Lab. of IIP, Inst. of Comput. Tech., CAS; Sch. of Computer Sci. and Tech., UCAS; Key Lab. of BDKM, CAS; Peng Cheng Lab., Beijing, China
View Profile

MM '20: Proceedings of the 28th ACM International Conference on MultimediaOctober 2020Pages 3514–3522https://doi.org/10.1145/3394171.3413739

Published:12 October 2020Publication History

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 3514–3522

ABSTRACT

Nowadays, click-through rate (CTR) prediction has achieved great success in online advertising. However, making desirable predictions for unseen ads is still challenging, which is known as the cold-start problem. To address such a problem in CTR prediction, meta-learning methods have recently emerged as a popular direction. In these approaches, the predictions for each user/item are regarded as individual tasks, then training a meta-learner on them to implement zero-shot/few-shot learning for unknown tasks. Though these approaches have effectively alleviated the cold-start problem, two facts are not paid enough attention, 1) the diversity of the task difficulty and 2) the perturbation of the task distribution. In this paper, we propose an adaptive loss that ensures the consistency between the task weight and difficulty. Interestingly, the loss function can also be viewed as a description of the worst-case performance under distribution perturbation. Moreover, we develop an algorithm, under the framework of gradient descent with max-oracle (GDmax), to minimize such an adaptive loss. Then we prove the algorithm can return to a stationary point of the adaptive loss. Finally, we implement our method on top of the meta-embedding framework and conduct experiments on three real-world datasets. The experiments show that our proposed method significantly improves the predictions in the cold-start scenario.

Supplemental Material

3394171.3413739.mp4

mp4

56.1 MB

Download

Available for Download

zip

mmfp1803aux.zip (632.8 KB)

There is a file named sup.pdf in this zip. It consists of theoretical and some experimental results.

References

Aharon Ben-Tal and Arkadi Nemirovski. 2002. Robust optimization--methodology and applications. Mathematical Programming, Vol. 92, 3 (2002), 453--480.Google ScholarCross Ref
Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.Google Scholar
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et almbox. 2016. Wide & deep learning for recommender systems. In workshop on DLRS. 7--10.Google ScholarDigital Library
John M. Danskin. 1966. The Theory of Max-Min, with Applications. SIAM J. Appl. Math., Vol. 14, 4 (1966), 641--664.Google ScholarDigital Library
John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. 2008. Efficient projections onto the l1-ball for learning in high dimensions. In ICML. 272--279.Google Scholar
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML. 1126--1135.Google Scholar
Wenjing Fu, Zhaohui Peng, Senzhang Wang, Yang Xu, and Jin Li. 2019. Deeply Fusing Reviews and Contents for Cold Start Users in Cross-Domain Recommendation Systems. In AAAI. 94--101.Google Scholar
Quanquan Gu, Jie Zhou, and Chris Ding. 2010. Collaborative filtering: Weighted nonnegative matrix factorization incorporating user and item graphs. In SDM. 199--210.Google Scholar
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In IJCAI. 1725--1731.Google Scholar
Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In SIGIR. 355--364.Google Scholar
Fuxing Hong, Dongbo Huang, and Ge Chen. 2019. Interaction-Aware Factorization Machines for Recommender Systems. In AAAI. 3804--3811.Google Scholar
Liang Hu, Songlei Jian, Longbing Cao, Zhiping Gu, Qingkui Chen, and Artak Amirbekyan. 2019. HERS: Modeling Influential Contexts with Heterogeneous Relations for Sparse and Cold-Start Recommendation. In AAAI. 3830--3837.Google Scholar
Yuchin Juan, Damien Lefortier, and Olivier Chapelle. 2017. Field-aware Factorization Machines in a Real-world Online Advertising System. In WWW. 680--688.Google Scholar
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.Google Scholar
Liang Lan and Yu Geng. 2019. Accurate and Interpretable Factorization Machines. In AAAI. 4139--4146.Google Scholar
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In SIGKDD. 1754--1763.Google Scholar
Ming Lin, Shuang Qiu, Jieping Ye, Xiaomin Song, Qi Qian, Liang Sun, Shenghuo Zhu, and Rong Jin. 2019 b. Which Factorization Machine Modeling Is Better: A Theoretical Answer with Optimal Guarantee. In AAAI. 4312--4319.Google Scholar
Tianyi Lin, Chi Jin, and Michael I Jordan. 2019 a. On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems. arXiv preprint arXiv:1906.00331 (2019).Google Scholar
Greg Linden, Brent Smith, and Jeremy York. 2003. Industry Report: Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Distributed Systems Online, Vol. 4, 1 (2003).Google Scholar
Jiawei Liu, Zheng-Jun Zha, Di Chen, Richang Hong, and Meng Wang. 2019. Adaptive transfer network for cross-domain person re-identification. In CVPR. 7202--7211.Google Scholar
Weiwen Liu, Ruiming Tang, Jiajin Li, Jinkai Yu, Huifeng Guo, Xiuqiang He, and Shengyu Zhang. 2018. Field-aware probabilistic embedding neural network for CTR prediction. In RecSys. 412--416.Google Scholar
Hongseok Namkoong and John C Duchi. 2017. Variance-based regularization with convex objectives. In NIPS. 2971--2980.Google Scholar
Wentao Ouyang, Xiuwu Zhang, Li Li, Heng Zou, Xin Xing, Zhaojie Liu, and Yanlong Du. 2019. Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction. In SIGKDD. 2078--2086.Google Scholar
Feiyang Pan, Shuokai Li, Xiang Ao, Pingzhong Tang, and Qing He. 2019. Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings. In SIGIR.Google Scholar
Junwei Pan, Jian Xu, Alfonso Lobos Ruiz, Wenliang Zhao, Shengjun Pan, Yu Sun, and Quan Lu. 2018. Field-weighted factorization machines for click-through rate prediction in display advertising. In WWW. 1349--1357.Google Scholar
Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction. In SIGKDD. ACM, 2671--2679.Google Scholar
Surabhi Punjabi and Priyanka Bhatt. 2018. Robust Factorization Machines for User Response Prediction. In WWW. 669--678.Google Scholar
Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In ICDM. 1149--1154.Google Scholar
Kan Ren, Jiarui Qin, Yuchen Fang, Weinan Zhang, Lei Zheng, Weijie Bian, Guorui Zhou, Jian Xu, Yong Yu, Xiaoqiang Zhu, and Kun Gai. 2019. Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction. In SIGIR. 565--574.Google Scholar
Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2011. Fast context-aware recommendations with factorization machines. In SIGIR. 635--644.Google Scholar
Sujoy Roy and Sharath Chandra Guntuku. 2016. Latent factor representations for cold-start video recommendation. In RecSys. 99--106.Google Scholar
Martin Saveski and Amin Mantrach. 2014. Item cold-start recommendations: learning local collective embeddings. In RecSys. 89--96.Google Scholar
Andrew I Schein, Alexandrin Popescul, Lyle H Ungar, and David M Pennock. 2002. Methods and metrics for cold-start recommendations. In SIGIR. 253--260.Google Scholar
Yanir Seroussi, Fabian Bohnert, and Ingrid Zukerman. 2011. Personalised rating prediction for new users using latent factor models. In HT. 47--56.Google Scholar
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks. In CIKM. ACM, 1161--1170.Google Scholar
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. JMLR, Vol. 15, 1 (2014), 1929--1958.Google ScholarDigital Library
Alexandre B. Tsybakov. 2009. Introduction to Nonparametric Estimation.Google Scholar
Manasi Vartak, Arvind Thiagarajan, Conrado Miranda, Jeshua Bratman, and Hugo Larochelle. 2017. A meta-learning perspective on cold-start recommendations for items. In NIPS. 6904--6914.Google Scholar
Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial intelligence review, Vol. 18, 2 (2002), 77--95.Google Scholar
Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. 2017. Dropoutnet: Addressing cold start in recommender systems. In NIPS. 4957--4966.Google Scholar
Qianqian Wang, Fang'ai Liu, Shuning Xing, Xiaohui Zhao, and Tianlai Li. 2019. Research on CTR Prediction Based on Deep Learning. IEEE Access (2019), 12779--12789.Google Scholar
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In ADKDD, 2017. 12:1--12:7.Google Scholar
Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. In IJCAI. 3119--3125.Google Scholar
Mi Zhang, Jie Tang, Xuchen Zhang, and Xiangyang Xue. 2014. Addressing cold start in recommender systems: A semi-supervised co-training algorithm. In SIGIR. 73--82.Google Scholar
Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction. In ECIR. 45--57.Google Scholar
Wayne Xin Zhao, Sui Li, Yulan He, Edward Y Chang, Ji-Rong Wen, and Xiaoming Li. 2016. Connecting social media to e-commerce: cold-start product recommendation using microblogging information. IEEE Transactions on Knowledge and Data Engineering, Vol. 28 (2016), 1147--1159.Google ScholarDigital Library
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In AAAI. 5941--5948.Google Scholar
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In SIGKDD. 1059--1068.Google Scholar

Index Terms

Task-distribution-aware Meta-learning for Cold-start CTR Prediction
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

FORM: Follow the Online Regularized Meta-Leader for Cold-Start Recommendation
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Meta-learning based recommendation systems alleviate the cold-start problem through a bi-level meta-optimization process. Recommendation borrows prior experience from pre-trained static system-level parameters and fine-tunes the model in user-level for ...
Read More
Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Click-through rate (CTR) prediction has been one of the most central problems in computational advertising. Lately, embedding techniques that produce low-dimensional representations of ad IDs drastically improve CTR prediction accuracies. However, such ...
Read More
Task Similarity Aware Meta Learning for Cold-Start Recommendation
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

In recommender systems, content-based methods and meta-learning involved methods usually have been adopted to alleviate the item cold-start problem. The former consider utilizing item attributes at the feature level and the latter aim at learning a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cold start
ctr prediction
meta-learning
min-max optimization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 390
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Task-distribution-aware Meta-learning for Cold-start CTR Prediction

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

FORM: Follow the Online Regularized Meta-Leader for Cold-Start Recommendation

Warm Up Cold-start Advertisements: Improving CTR Predictions via Learning to Learn ID Embeddings

Task Similarity Aware Meta Learning for Cold-Start Recommendation