research-article

Multi-gate Mixture-of-Contrastive-Experts with Graph-based Gating Mechanism for TV Recommendation

Authors:
Cong Zhang

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China

0009-0007-5132-6261
View Profile

,
Dongyang Liu

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China

0009-0002-6160-0707
View Profile

,
Lin Zuo

China Mobile Research Institute, Xi'an, China

China Mobile Research Institute, Xi'an, China

0009-0005-2413-6898
View Profile

,
Junlan Feng

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China

0000-0001-5292-2945
View Profile

,
Chao Deng

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China

0000-0003-4449-5247
View Profile

,
Jian Sun

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China

0009-0001-8503-4458
View Profile

,
Haitao Zeng

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China

0000-0003-2728-9724
View Profile

,
Yaohong Zhao

China Mobile Research Institute, Beijing, China

China Mobile Research Institute, Beijing, China

0009-0007-2264-7939
View Profile

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementOctober 2023Pages 4938–4944https://doi.org/10.1145/3583780.3615488

Published:21 October 2023Publication History

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 4938–4944

ABSTRACT

With the rapid development of smart TV, TV recommendation is attracting more and more users. TV users usually distribute in multiple regions with different cultures and hence have diverse TV program preferences. From the perspective of engineering practice and performance improvement, it's very essential to model users from multiple regions with one single model. In previous work, Multi-gate Mixture-of-Expert (MMoE) has been widely adopted in multi-task and multi-domain recommendation scenarios. In practice, however, we first observe the embeddings generated by experts tend to be homogeneous which may result in high semantic similarities among embeddings that reduce the capability of Multi-gate Mixture-of-Expert (MMoE) model. Secondly, we also find there are lots of commonalities and differences between multiple regions regarding user preferences. Therefore, it's meaningful to model the complicated relationships between regions. In this paper, we first introduce contrastive learning to overcome the expert representation degeneration problem. The embeddings of two augmented samples generated by the same experts are pushed closer to enhance the alignment, and the embeddings of the same samples generated by different experts are pushed away in vector space to improve uniformity. Then we propose a Graph-based Gating Mechanism to empower typical Multi-gate Mixture-of-Experts. Graph-based MMoE is able to recognize the commonalities and differences among multiple regions by introducing a Graph Neural Network (GNN) with region similarity prior. We name our model Multi-gate Mixture-of-Contrastive-Experts model with Graph-based Gating Mechanism (MMoCEG). Extensive offline experiments and online A/B tests on a commercial TV service provider over 100 million users and 2.3 million items demonstrate the efficacy of MMoCEG compared to the existing models.

Supplemental Material

1152-video.mp4

mp4

261.2 MB

Download

References

Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee, and Raz Nissim. 2015. Watch-it-next: a contextual tv recommendation system. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7--11, 2015, Proceedings, Part III 15. Springer, 180--195.Google ScholarCross Ref
Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2008. Convex multi-task feature learning. Machine learning, 73, 243--272.Google Scholar
Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarCross Ref
Daniel Bis, Maksim Podkorytov, and Xiuwen Liu. 2021. Too much in common: shifting of embeddings in transformer language models and its implications. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5117-- 5130.Google Scholar
Rich Caruana. 1997. Multitask learning. Machine learning, 28, 41--75.Google Scholar
Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1187--1196.Google ScholarDigital Library
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. ICML.Google Scholar
Mark Dredze, Alex Kulesza, and Koby Crammer. 2010. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79, 1- 2, 123--149.Google ScholarDigital Library
Kawin Ethayarajh. 2019. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:1909.00512.Google Scholar
Tom Fawcett. 2006. An introduction to roc analysis. Pattern recognition letters, 27, 8, 861--874.Google Scholar
Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and Keping Yang. 2019. Deep session interest network for click-through rate prediction. arXiv preprint arXiv:1905.06482.Google Scholar
Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2019. Representation degeneration problem in training natural language generation models. arXiv preprint arXiv:1907.12009.Google Scholar
Yulong Gu, Wentian Bao, Dan Ou, Xiang Li, Baoliang Cui, Biyu Ma, Haikuan Huang, Qingwen Liu, and Xiaoyi Zeng. 2021. Self-supervised learning on users' spontaneous behaviors for multi-scenario ranking in e-commerce. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 3828--3837.Google ScholarDigital Library
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247.Google Scholar
Xiaobo Hao, Yudan Liu, Ruobing Xie, Kaikai Ge, Linyao Tang, Xu Zhang, and Leyu Lin. 2021. Adversarial feature translation for multi-domain recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2964--2973.Google ScholarDigital Library
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729--9738.Google ScholarCross Ref
Shang Hwa Hsu, Ming-Hui Wen, Hsin-Chieh Lin, Chun-Chia Lee, Chia-Hoang Lee, et al. 2007. Aimed-a personalized tv recommendation system. In EuroITV, 166--174.Google Scholar
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7482--7491.Google Scholar
Diederik P Kingma and Jimmy Ba. 2014. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864.Google Scholar
Pengcheng Li, Runze Li, Qing Da, An-Xiang Zeng, and Lijun Zhang. 2020. Improving multi-scenario learning to rank in e-commerce by exploiting task relationships in the label space. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2605--2612.Google ScholarDigital Library
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixtureof- experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 1930--1939.Google ScholarDigital Library
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3994--4003.Google ScholarCross Ref
Jiarui Qin et al. 2023. Learning to distinguish multi-user coupling behaviors for tv recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 204--212.Google Scholar
Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. In Proceedings of the fifteenth ACM international conference on web search and data mining, 813--823.Google ScholarDigital Library
Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.Google Scholar
Xiang-Rong Sheng et al. 2021. One model to serve all: star topology adaptive recommender for multi-domain ctr prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4104--4113.Google Scholar
Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): a novel multi-task learning (mtl) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, 269--278.Google ScholarDigital Library
Chenyang Wang et al. 2022. Target interest distillation for multi-interest recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2007--2016.Google Scholar
Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, and Ning Gu. 2022. Cl4ctr: a contrastive learning framework for ctr prediction. arXiv preprint arXiv:2212.00522.Google Scholar
Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, and Ning Gu. 2023. Cl4ctr: a contrastive learning framework for ctr prediction. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 805--813.Google ScholarDigital Library
Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, and Quanquan Gu. 2020. Improving neural language generation with spectrum control. In International Conference on Learning Representations.Google Scholar
Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929--9939.Google Scholar
Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929--9939.Google Scholar
Yichao Wang et al. 2022. Causalint: causal inspired intervention for multiscenario recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 4090--4099.Google Scholar
Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 3733-- 3742.Google Scholar
Zhibo Xiao, Luwei Yang, Wen Jiang, Yi Wei, Yi Hu, and Hao Wang. 2020. Deep multi-interest network for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2265--2268.Google ScholarDigital Library
Tiansheng Yao et al. 2021. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4321--4330.Google Scholar
Yuanliang Zhang, Xiaofeng Wang, Jinxin Hu, Ke Gao, Chenyi Lei, and Fei Fang. 2022. Scenario-adaptive and self-supervised model for multi-scenario personalized recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 3674--3683.Google ScholarDigital Library
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence number 01. Vol. 33, 5941--5948.Google ScholarDigital Library
Guorui Zhou et al. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 1059--1068.Google Scholar
Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-rec: self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management, 1893--1902.Google ScholarDigital Library

Index Terms

Multi-gate Mixture-of-Contrastive-Experts with Graph-based Gating Mechanism for TV Recommendation
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Self-Supervised Group Graph Collaborative Filtering for Group Recommendation
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Nowadays, it is more and more convenient for people to participate in group activities. Therefore, providing some recommendations to groups of individuals is indispensable. Group recommendation is the task of suggesting items or events for a group of ...
Read More
Cross-view temporal graph contrastive learning for session-based recommendation
Abstract
Session-based recommendation (SBR) aims at recommending items given the behavior sequences of anonymous users in a short-term session. Many recent SBR methods construct all sessions as a global graph that captures cross-session item transition ...
Read More
Contrastive Collaborative Filtering for Cold-Start Item Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023

The cold-start problem is a long-standing challenge in recommender systems. As a promising solution, content-based generative models usually project a cold-start item’s content onto a warm-start item embedding to capture collaborative signals from item ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
ISBN:9798400701245
DOI:10.1145/3583780
General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contrastive learning
graph neural network
multi-gate mixture of experts
multi-region TV recommendation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 153
  Total Downloads
- Downloads (Last 12 months)153
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-gate Mixture-of-Contrastive-Experts with Graph-based Gating Mechanism for TV Recommendation

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Self-Supervised Group Graph Collaborative Filtering for Group Recommendation

Cross-view temporal graph contrastive learning for session-based recommendation

Contrastive Collaborative Filtering for Cold-Start Item Recommendation