ABSTRACT
With the rapid development of smart TV, TV recommendation is attracting more and more users. TV users usually distribute in multiple regions with different cultures and hence have diverse TV program preferences. From the perspective of engineering practice and performance improvement, it's very essential to model users from multiple regions with one single model. In previous work, Multi-gate Mixture-of-Expert (MMoE) has been widely adopted in multi-task and multi-domain recommendation scenarios. In practice, however, we first observe the embeddings generated by experts tend to be homogeneous which may result in high semantic similarities among embeddings that reduce the capability of Multi-gate Mixture-of-Expert (MMoE) model. Secondly, we also find there are lots of commonalities and differences between multiple regions regarding user preferences. Therefore, it's meaningful to model the complicated relationships between regions. In this paper, we first introduce contrastive learning to overcome the expert representation degeneration problem. The embeddings of two augmented samples generated by the same experts are pushed closer to enhance the alignment, and the embeddings of the same samples generated by different experts are pushed away in vector space to improve uniformity. Then we propose a Graph-based Gating Mechanism to empower typical Multi-gate Mixture-of-Experts. Graph-based MMoE is able to recognize the commonalities and differences among multiple regions by introducing a Graph Neural Network (GNN) with region similarity prior. We name our model Multi-gate Mixture-of-Contrastive-Experts model with Graph-based Gating Mechanism (MMoCEG). Extensive offline experiments and online A/B tests on a commercial TV service provider over 100 million users and 2.3 million items demonstrate the efficacy of MMoCEG compared to the existing models.
Supplemental Material
- Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee, and Raz Nissim. 2015. Watch-it-next: a contextual tv recommendation system. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7--11, 2015, Proceedings, Part III 15. Springer, 180--195.Google ScholarCross Ref
- Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2008. Convex multi-task feature learning. Machine learning, 73, 243--272.Google Scholar
- Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarCross Ref
- Daniel Bis, Maksim Podkorytov, and Xiuwen Liu. 2021. Too much in common: shifting of embeddings in transformer language models and its implications. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5117-- 5130.Google Scholar
- Rich Caruana. 1997. Multitask learning. Machine learning, 28, 41--75.Google Scholar
- Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1187--1196.Google ScholarDigital Library
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. ICML.Google Scholar
- Mark Dredze, Alex Kulesza, and Koby Crammer. 2010. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79, 1- 2, 123--149.Google ScholarDigital Library
- Kawin Ethayarajh. 2019. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:1909.00512.Google Scholar
- Tom Fawcett. 2006. An introduction to roc analysis. Pattern recognition letters, 27, 8, 861--874.Google Scholar
- Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and Keping Yang. 2019. Deep session interest network for click-through rate prediction. arXiv preprint arXiv:1905.06482.Google Scholar
- Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2019. Representation degeneration problem in training natural language generation models. arXiv preprint arXiv:1907.12009.Google Scholar
- Yulong Gu, Wentian Bao, Dan Ou, Xiang Li, Baoliang Cui, Biyu Ma, Haikuan Huang, Qingwen Liu, and Xiaoyi Zeng. 2021. Self-supervised learning on users' spontaneous behaviors for multi-scenario ranking in e-commerce. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 3828--3837.Google ScholarDigital Library
- Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247.Google Scholar
- Xiaobo Hao, Yudan Liu, Ruobing Xie, Kaikai Ge, Linyao Tang, Xu Zhang, and Leyu Lin. 2021. Adversarial feature translation for multi-domain recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2964--2973.Google ScholarDigital Library
- Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729--9738.Google ScholarCross Ref
- Shang Hwa Hsu, Ming-Hui Wen, Hsin-Chieh Lin, Chun-Chia Lee, Chia-Hoang Lee, et al. 2007. Aimed-a personalized tv recommendation system. In EuroITV, 166--174.Google Scholar
- Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7482--7491.Google Scholar
- Diederik P Kingma and Jimmy Ba. 2014. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
- Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864.Google Scholar
- Pengcheng Li, Runze Li, Qing Da, An-Xiang Zeng, and Lijun Zhang. 2020. Improving multi-scenario learning to rank in e-commerce by exploiting task relationships in the label space. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2605--2612.Google ScholarDigital Library
- Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixtureof- experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 1930--1939.Google ScholarDigital Library
- Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3994--4003.Google ScholarCross Ref
- Jiarui Qin et al. 2023. Learning to distinguish multi-user coupling behaviors for tv recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 204--212.Google Scholar
- Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. In Proceedings of the fifteenth ACM international conference on web search and data mining, 813--823.Google ScholarDigital Library
- Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.Google Scholar
- Xiang-Rong Sheng et al. 2021. One model to serve all: star topology adaptive recommender for multi-domain ctr prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4104--4113.Google Scholar
- Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): a novel multi-task learning (mtl) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, 269--278.Google ScholarDigital Library
- Chenyang Wang et al. 2022. Target interest distillation for multi-interest recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2007--2016.Google Scholar
- Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, and Ning Gu. 2022. Cl4ctr: a contrastive learning framework for ctr prediction. arXiv preprint arXiv:2212.00522.Google Scholar
- Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, and Ning Gu. 2023. Cl4ctr: a contrastive learning framework for ctr prediction. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 805--813.Google ScholarDigital Library
- Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, and Quanquan Gu. 2020. Improving neural language generation with spectrum control. In International Conference on Learning Representations.Google Scholar
- Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929--9939.Google Scholar
- Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929--9939.Google Scholar
- Yichao Wang et al. 2022. Causalint: causal inspired intervention for multiscenario recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 4090--4099.Google Scholar
- Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 3733-- 3742.Google Scholar
- Zhibo Xiao, Luwei Yang, Wen Jiang, Yi Wei, Yi Hu, and Hao Wang. 2020. Deep multi-interest network for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2265--2268.Google ScholarDigital Library
- Tiansheng Yao et al. 2021. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4321--4330.Google Scholar
- Yuanliang Zhang, Xiaofeng Wang, Jinxin Hu, Ke Gao, Chenyi Lei, and Fei Fang. 2022. Scenario-adaptive and self-supervised model for multi-scenario personalized recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 3674--3683.Google ScholarDigital Library
- Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence number 01. Vol. 33, 5941--5948.Google ScholarDigital Library
- Guorui Zhou et al. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 1059--1068.Google Scholar
- Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-rec: self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management, 1893--1902.Google ScholarDigital Library
Index Terms
- Multi-gate Mixture-of-Contrastive-Experts with Graph-based Gating Mechanism for TV Recommendation
Recommendations
Self-Supervised Group Graph Collaborative Filtering for Group Recommendation
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data MiningNowadays, it is more and more convenient for people to participate in group activities. Therefore, providing some recommendations to groups of individuals is indispensable. Group recommendation is the task of suggesting items or events for a group of ...
Cross-view temporal graph contrastive learning for session-based recommendation
AbstractSession-based recommendation (SBR) aims at recommending items given the behavior sequences of anonymous users in a short-term session. Many recent SBR methods construct all sessions as a global graph that captures cross-session item transition ...
Contrastive Collaborative Filtering for Cold-Start Item Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023The cold-start problem is a long-standing challenge in recommender systems. As a promising solution, content-based generative models usually project a cold-start item’s content onto a warm-start item embedding to capture collaborative signals from item ...
Comments