skip to main content
10.1145/3459637.3481941acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

One Model to Serve All: Star Topology Adaptive Recommender for Multi-Domain CTR Prediction

Published: 30 October 2021 Publication History

Abstract

Traditional industry recommendation systems usually use data in a single domain to train models and then serve the domain. However, a large-scale commercial platform often contains multiple domains, and its recommendation system often needs to make click-through rate (CTR) predictions for multiple domains. Generally, different domains may share some common user groups and items, and each domain may have its own unique user groups and items. Moreover, even the same user may have different behaviors in different domains. In order to leverage all the data from different domains, a single model can be trained to serve all domains. However, it is difficult for a single model to capture the characteristics of various domains and serve all domains well. On the other hand, training an individual model for each domain separately does not fully use the data from all domains. In this paper, we propose the Star Topology Adaptive Recommender (STAR) model to train a single model to serve all domains by leveraging data from all domains simultaneously, capturing the characteristics of each domain, and modeling the commonalities between different domains. Essentially, the net- work of each domain consists of two factorized networks: one centered network shared by all domains and the domain-specific network tailored for each domain. For each domain, we combine these two factorized networks and generate a unified network by element-wise multiplying the weights of the shared network and those of the domain-specific network, although these two factorized networks can be combined using other functions, which is open for further research. Most importantly, STAR can learn the shared network from all the data and adapt domain-specific parameters according to the characteristics of each domain. The experimental results from production data validate the superiority of the proposed STAR model. Since late 2020, STAR has been deployed in the display advertising system of Alibaba, obtaining 8.0% improvement on CTR and 6.0% increase on RPM (Revenue Per Mille).

References

[1]
Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2008. Convex multi-task feature learning. Machine Learning, Vol. 73, 3 (2008), 243--272.
[2]
Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR, Vol. abs/1607.06450 (2016).
[3]
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning, Vol. 79, 1--2 (2010), 151--175.
[4]
Steffen Bickel, Michael Brü ckner, and Tobias Scheffer. 2007. Discriminative learning for differing training and test distributions. In Proceedings of the 24th International Conference on Machine Learning, Vol. 227. 81--88.
[5]
Rich Caruana. 1998. Multitask Learning. In Learning to Learn. 95--133.
[6]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.
[7]
Ronan Collobert and Jason Weston. 2008. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In Proceedings of the 25-th International Conference on Machine Learning, William W. Cohen, Andrew McCallum, and Sam T. Roweis (Eds.), Vol. 307. 160--167.
[8]
Li Deng, Geoffrey E. Hinton, and Brian Kingsbury. 2013. New Types of Deep Neural Network Learning for Speech Recognition and Related Applications: An Overview. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 8599--8603.
[9]
Mark Dredze, Alex Kulesza, and Koby Crammer. 2010. Multi-Domain Learning by Confidence-Weighted Parameter Combination. Maching Learning, Vol. 79, 1--2 (2010), 123--149.
[10]
Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and Keping Yang. 2019. Deep Session Interest Network for Click-Through Rate Prediction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2301--2307.
[11]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[12]
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017a. On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70). 1321--1330.
[13]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017b. Deepfm: a factorization-machine based neural network for ctr prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia., 2782--2788.
[14]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. 448--456.
[15]
Biye Jiang, Chao Deng, Huimin Yi, Zelin Hu, Guorui Zhou, Yang Zheng, Sui Huang, Xinyang Guo, Dongyue Wang, Yue Song, et al. 2019. XDL: An Industrial Deep Learning Framework for High-Dimensional Sparse Data. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1--9.
[16]
Mahesh Joshi, Mark Dredze, William W. Cohen, and Carolyn Penstein Rosé. 2012. Multi-Domain Learning: When Do Domains Matter?. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1302--1312.
[17]
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. 7482--7491.
[18]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations.
[19]
Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, Nevada, USA, 426--434.
[20]
Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. IEEE Computer, Vol. 42, 8 (2009), 30--37.
[21]
Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Huan Zhao, Pipei Huang, Guoliang Kang, Qiwei Chen, Wei Li, and Dik Lun Lee. 2019. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2615--2623.
[22]
Pengcheng Li, Runze Li, Qing Da, Anxiang Zeng, and Lijun Zhang. 2020. Improving Multi-Scenario Learning to Rank in E-commerce by Exploiting Task Relationships in the Label Space. In Proceedings of The 29th ACM International Conference on Information and Knowledge Management0. 2605--2612.
[23]
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, United Kingdom.
[24]
Jiaqi Ma, Zhe Zhao, Jilin Chen, Ang Li, Lichan Hong, and Ed H. Chi. 2019. SNR: Sub-Network Routing for Flexible Parameter Sharing in Multi-Task Learning. In Proceedings of The 33rd AAAI Conference on Artificial Intelligence. 216--223.
[25]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018b. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930--1939.
[26]
Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018a. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1137--1140.
[27]
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-Stitch Networks for Multi-task Learning. 3994--4003.
[28]
Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on Long Sequential User Behavior Modeling for Click-through Rate Prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1059--1068.
[29]
Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Xiaoqiang Zhu, and Kun Gai. 2020. Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction. In Proceeding of The 29th ACM International Conference on Information and Knowledge Management. 2685--2692.
[30]
Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In Proceedings of the16th International Conference on Data Mining. IEEE, 1149--1154.
[31]
Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of the 4th International Conference on Learning Representations.
[32]
Steffen Rendle. 2010. Factorization machines. In Proceedings of the 10th International Conference on Data Mining. IEEE, 995--1000.
[33]
Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. CoRR, Vol. abs/1706.05098 (2017).
[34]
Alice Schoenauer Sebag, Louise Heinrich, Marc Schoenauer, Michèle Sebag, Lani F. Wu, and Steven J. Altschuler. 2019. Multi-Domain Adversarial Learning. In Proceedings of the 7th International Conference on Learning Representations.
[35]
Ozan Sener and Vladlen Koltun. 2018. Multi-Task Learning as Multi-Objective Optimization. In Advances in Neural Information Processing Systems 31. 525--536.
[36]
Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems. 269--278.
[37]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30. 5998--6008.
[38]
Hong Wen, Jing Zhang, Yuan Wang, Fuyu Lv, Wentian Bao, Quan Lin, and Keping Yang. 2020. Entire Space Multi-Task Modeling via Post-Click Behavior Decomposition for Conversion Rate Prediction. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 2377--2386.
[39]
Zhibo Xiao, Luwei Yang, Wen Jiang, Yi Wei, Yi Hu, and Hao Wang. 2020. Deep Multi-Interest Network for Click-through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 2265--2268.
[40]
Yongxin Yang and Timothy M. Hospedales. 2015. A Unified Perspective on Multi-Domain and Multi-Task Learning. In Proceeding of the 3rd International Conference on Learning Representations.
[41]
Fajie Yuan, Guoxiao Zhang, Alexandros Karatzoglou, Xiangnan He, Joemon Jose, Beibei Kong, and Yudong Li. 2020. One Person, One Model, One World: Learning Continual User Representation without Forgetting. CoRR, Vol. abs/2009.13724 (2020). arxiv: 2009.13724 https://arxiv.org/abs/2009.13724
[42]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep Interest Evolution Network for Click-Through Rate Prediction. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, Hawaii, USA, 5941--5948.
[43]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1059--1068.
[44]
Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. 2008. Large-scale parallel collaborative filtering for the netflix prize. In Proceedings of the International Conference on Algorithmic Applications in Management. Springer, 337--348.
[45]
Han Zhu, Junqi Jin, Chang Tan, Fei Pan, Yifan Zeng, Han Li, and Kun Gai. 2017. Optimized Cost per Click in Taobao Display Advertising. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2191--2200.

Cited By

View all
  • (2025)A Contrastive Pretrain Model with Prompt Tuning for Multi-center Medication RecommendationACM Transactions on Information Systems10.1145/370663143:3(1-29)Online publication date: 3-Jan-2025
  • (2025)AMLCDR: An Adaptive Meta-Learning Model for Cross-Domain Recommendation by Aligning Preference DistributionsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703539(606-615)Online publication date: 10-Mar-2025
  • (2025)Towards Personalized Federated Multi-Scenario Multi-Task RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703523(429-438)Online publication date: 10-Mar-2025
  • Show More Cited By

Index Terms

  1. One Model to Serve All: Star Topology Adaptive Recommender for Multi-Domain CTR Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
    October 2021
    4966 pages
    ISBN:9781450384469
    DOI:10.1145/3459637
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. display advertising
    2. multi-domain learning
    3. recommender system

    Qualifiers

    • Research-article

    Conference

    CIKM '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)211
    • Downloads (Last 6 weeks)29
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A Contrastive Pretrain Model with Prompt Tuning for Multi-center Medication RecommendationACM Transactions on Information Systems10.1145/370663143:3(1-29)Online publication date: 3-Jan-2025
    • (2025)AMLCDR: An Adaptive Meta-Learning Model for Cross-Domain Recommendation by Aligning Preference DistributionsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703539(606-615)Online publication date: 10-Mar-2025
    • (2025)Towards Personalized Federated Multi-Scenario Multi-Task RecommendationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703523(429-438)Online publication date: 10-Mar-2025
    • (2025)Hybrid contrastive multi-scenario learning for multi-task sequential-dependence recommendationNeural Networks10.1016/j.neunet.2024.106953183(106953)Online publication date: Mar-2025
    • (2025)A user behavior-aware multi-task learning model for enhanced short video recommendationNeurocomputing10.1016/j.neucom.2024.129076617(129076)Online publication date: Feb-2025
    • (2025)Towards Mixture of Task-Intensive Experts for Multi-task RecommendationDatabase Systems for Advanced Applications10.1007/978-981-97-5555-4_22(323-332)Online publication date: 12-Jan-2025
    • (2024)D3Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i8.28699(8553-8561)Online publication date: 20-Feb-2024
    • (2024)Tag Tree-Guided Multi-grained Alignment for Multi-Domain Short Video RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681692(5683-5691)Online publication date: 28-Oct-2024
    • (2024)An Active Masked Attention Framework for Many-to-Many Cross-Domain RecommendationsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681435(9680-9689)Online publication date: 28-Oct-2024
    • (2024)Multi-Scenario and Multi-Task Aware Feature Interaction for Recommendation SystemACM Transactions on Knowledge Discovery from Data10.1145/365131218:6(1-20)Online publication date: 12-Apr-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media