ABSTRACT
Large-scale industrial systems often include multiple scenarios to satisfy diverse user needs. The common approach of using one model per scenario does not scale well and not suitable for minor scenarios with limited samples. An solution is to train a model on all scenarios, which can introduce domination and bias from the main scenario. MMoE-like structures have been proposed for multi-scenario prediction, but they do not explicitly address the issue of gradient unbalancing. This work proposes an adaptive loss harmonizing (ALH) algorithm for multi-scenario CTR prediction. It dynamically adjusts the learning speed for balanced training and improved performance. Experiments on real industrial datasets and rigorous A/B testing prove our method’s superiority.
- Rich Caruana. 1997. Multitask learning. Machine learning 28, 1 (1997), 41–75.Google Scholar
- Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning. PMLR, 794–803.Google Scholar
- Michael Crawshaw. 2020. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796 (2020).Google Scholar
- Yuchen Jiang, Qi Li, Han Zhu, Jinbei Yu, Jin Li, Ziru Xu, Huihui Dong, and Bo Zheng. 2022. Adaptive Domain Interest Network for Multi-domain Recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3212–3221.Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Xiao Lin, Hongjie Chen, Changhua Pei, Fei Sun, Xuanji Xiao, Hanxiao Sun, Yongfeng Zhang, Wenwu Ou, and Peng Jiang. 2019. A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation. In Proceedings of the 13th ACM Conference on recommender systems. 20–28.Google ScholarDigital Library
- Congcong Liu, Yuejiang Li, Xiwei Zhao, Changping Peng, Zhangang Lin, and Jingping Shao. 2022. Concept Drift Adaptation for CTR Prediction in Online Advertising Systems. arXiv preprint arXiv:2204.05101 (2022).Google Scholar
- Congcong Liu, Yuejiang Li, Jian Zhu, Fei Teng, Xiwei Zhao, Changping Peng, Zhangang Lin, and Jingping Shao. 2022. Position Awareness Modeling with Knowledge Distillation for CTR Prediction. In Proceedings of the 16th ACM Conference on Recommender Systems. 562–566.Google ScholarDigital Library
- Congcong Liu, Fei Teng, Xiwei Zhao, Zhangang Lin, Jinghe Hu, and Jingping Shao. 2023. Always Strengthen Your Strengths: A Drift-Aware Incremental Learning Framework for CTR Prediction. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 1806–1810. https://doi.org/10.1145/3539618.3591948Google ScholarDigital Library
- Shikun Liu, Edward Johns, and Andrew J Davison. 2019. End-to-end multi-task learning with attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1871–1880.Google ScholarCross Ref
- Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1930–1939.Google ScholarDigital Library
- Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).Google Scholar
- Xiang-Rong Sheng, Liqin Zhao, Guorui Zhou, Xinyao Ding, Binding Dai, Qiang Luo, Siran Yang, Jingshan Lv, Chi Zhang, Hongbo Deng, 2021. One model to serve all: Star topology adaptive recommender for multi-domain ctr prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4104–4113.Google ScholarDigital Library
- Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations. In Fourteenth ACM Conference on Recommender Systems. 269–278.Google ScholarDigital Library
- Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD. ACM, Halifax, NS, Canada, 12:1–12:7.Google Scholar
- Zirui Wang, Yulia Tsvetkov, Orhan Firat, and Yuan Cao. 2020. Gradient vaccine: Investigating and improving multi-task optimization in massively multilingual models. arXiv preprint arXiv:2010.05874 (2020).Google Scholar
- Xuanhua Yang, Xiaoyu Peng, Penghui Wei, Shaoguo Liu, Liang Wang, and Bo Zheng. 2022. AdaSparse: Learning Adaptively Sparse Structures for Multi-Domain Click-Through Rate Prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 4635–4639.Google ScholarDigital Library
- Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, and Chelsea Finn. 2020. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems 33 (2020), 5824–5836.Google Scholar
- Guorui Zhou, Xiaoqiang Zhu, Chengru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). ACM, London, UK, 1059–1068.Google ScholarDigital Library
Recommendations
ADL: Adaptive Distribution Learning Framework for Multi-Scenario CTR Prediction
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalLarge-scale commercial platforms usually involve numerous business scenarios for diverse business strategies. To provide click-through rate (CTR) predictions for multiple scenarios simultaneously, existing promising multi-scenario models explicitly ...
OptMSM: Optimizing Multi-Scenario Modeling for Click-Through Rate Prediction
Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo TrackAbstractA large-scale industrial recommendation platform typically consists of multiple associated scenarios, requiring a unified click-through rate (CTR) prediction model to serve them simultaneously. Existing approaches for multi-scenario CTR prediction ...
Comments