skip to main content
10.1145/3583780.3615488acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Multi-gate Mixture-of-Contrastive-Experts with Graph-based Gating Mechanism for TV Recommendation

Published:21 October 2023Publication History

ABSTRACT

With the rapid development of smart TV, TV recommendation is attracting more and more users. TV users usually distribute in multiple regions with different cultures and hence have diverse TV program preferences. From the perspective of engineering practice and performance improvement, it's very essential to model users from multiple regions with one single model. In previous work, Multi-gate Mixture-of-Expert (MMoE) has been widely adopted in multi-task and multi-domain recommendation scenarios. In practice, however, we first observe the embeddings generated by experts tend to be homogeneous which may result in high semantic similarities among embeddings that reduce the capability of Multi-gate Mixture-of-Expert (MMoE) model. Secondly, we also find there are lots of commonalities and differences between multiple regions regarding user preferences. Therefore, it's meaningful to model the complicated relationships between regions. In this paper, we first introduce contrastive learning to overcome the expert representation degeneration problem. The embeddings of two augmented samples generated by the same experts are pushed closer to enhance the alignment, and the embeddings of the same samples generated by different experts are pushed away in vector space to improve uniformity. Then we propose a Graph-based Gating Mechanism to empower typical Multi-gate Mixture-of-Experts. Graph-based MMoE is able to recognize the commonalities and differences among multiple regions by introducing a Graph Neural Network (GNN) with region similarity prior. We name our model Multi-gate Mixture-of-Contrastive-Experts model with Graph-based Gating Mechanism (MMoCEG). Extensive offline experiments and online A/B tests on a commercial TV service provider over 100 million users and 2.3 million items demonstrate the efficacy of MMoCEG compared to the existing models.

Skip Supplemental Material Section

Supplemental Material

1152-video.mp4

mp4

261.2 MB

References

  1. Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee, and Raz Nissim. 2015. Watch-it-next: a contextual tv recommendation system. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7--11, 2015, Proceedings, Part III 15. Springer, 180--195.Google ScholarGoogle ScholarCross RefCross Ref
  2. Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. 2008. Convex multi-task feature learning. Machine learning, 73, 243--272.Google ScholarGoogle Scholar
  3. Oren Barkan and Noam Koenigstein. 2016. Item2vec: neural item embedding for collaborative filtering. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  4. Daniel Bis, Maksim Podkorytov, and Xiuwen Liu. 2021. Too much in common: shifting of embeddings in transformer language models and its implications. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5117-- 5130.Google ScholarGoogle Scholar
  5. Rich Caruana. 1997. Multitask learning. Machine learning, 28, 41--75.Google ScholarGoogle Scholar
  6. Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1187--1196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. ICML.Google ScholarGoogle Scholar
  8. Mark Dredze, Alex Kulesza, and Koby Crammer. 2010. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79, 1- 2, 123--149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kawin Ethayarajh. 2019. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:1909.00512.Google ScholarGoogle Scholar
  10. Tom Fawcett. 2006. An introduction to roc analysis. Pattern recognition letters, 27, 8, 861--874.Google ScholarGoogle Scholar
  11. Yufei Feng, Fuyu Lv, Weichen Shen, Menghan Wang, Fei Sun, Yu Zhu, and Keping Yang. 2019. Deep session interest network for click-through rate prediction. arXiv preprint arXiv:1905.06482.Google ScholarGoogle Scholar
  12. Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2019. Representation degeneration problem in training natural language generation models. arXiv preprint arXiv:1907.12009.Google ScholarGoogle Scholar
  13. Yulong Gu, Wentian Bao, Dan Ou, Xiang Li, Baoliang Cui, Biyu Ma, Haikuan Huang, Qingwen Liu, and Xiaoyi Zeng. 2021. Self-supervised learning on users' spontaneous behaviors for multi-scenario ranking in e-commerce. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 3828--3837.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247.Google ScholarGoogle Scholar
  15. Xiaobo Hao, Yudan Liu, Ruobing Xie, Kaikai Ge, Linyao Tang, Xu Zhang, and Leyu Lin. 2021. Adversarial feature translation for multi-domain recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2964--2973.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729--9738.Google ScholarGoogle ScholarCross RefCross Ref
  17. Shang Hwa Hsu, Ming-Hui Wen, Hsin-Chieh Lin, Chun-Chia Lee, Chia-Hoang Lee, et al. 2007. Aimed-a personalized tv recommendation system. In EuroITV, 166--174.Google ScholarGoogle Scholar
  18. Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7482--7491.Google ScholarGoogle Scholar
  19. Diederik P Kingma and Jimmy Ba. 2014. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google ScholarGoogle Scholar
  20. Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864.Google ScholarGoogle Scholar
  21. Pengcheng Li, Runze Li, Qing Da, An-Xiang Zeng, and Lijun Zhang. 2020. Improving multi-scenario learning to rank in e-commerce by exploiting task relationships in the label space. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2605--2612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixtureof- experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 1930--1939.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3994--4003.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jiarui Qin et al. 2023. Learning to distinguish multi-user coupling behaviors for tv recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 204--212.Google ScholarGoogle Scholar
  25. Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. In Proceedings of the fifteenth ACM international conference on web search and data mining, 813--823.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.Google ScholarGoogle Scholar
  27. Xiang-Rong Sheng et al. 2021. One model to serve all: star topology adaptive recommender for multi-domain ctr prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4104--4113.Google ScholarGoogle Scholar
  28. Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive layered extraction (ple): a novel multi-task learning (mtl) model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, 269--278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chenyang Wang et al. 2022. Target interest distillation for multi-interest recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2007--2016.Google ScholarGoogle Scholar
  30. Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, and Ning Gu. 2022. Cl4ctr: a contrastive learning framework for ctr prediction. arXiv preprint arXiv:2212.00522.Google ScholarGoogle Scholar
  31. Fangye Wang, Yingxu Wang, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, and Ning Gu. 2023. Cl4ctr: a contrastive learning framework for ctr prediction. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 805--813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, and Quanquan Gu. 2020. Improving neural language generation with spectrum control. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  33. Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929--9939.Google ScholarGoogle Scholar
  34. Tongzhou Wang and Phillip Isola. 2020. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning. PMLR, 9929--9939.Google ScholarGoogle Scholar
  35. Yichao Wang et al. 2022. Causalint: causal inspired intervention for multiscenario recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 4090--4099.Google ScholarGoogle Scholar
  36. Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 3733-- 3742.Google ScholarGoogle Scholar
  37. Zhibo Xiao, Luwei Yang, Wen Jiang, Yi Wei, Yi Hu, and Hao Wang. 2020. Deep multi-interest network for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2265--2268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tiansheng Yao et al. 2021. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 4321--4330.Google ScholarGoogle Scholar
  39. Yuanliang Zhang, Xiaofeng Wang, Jinxin Hu, Ke Gao, Chenyi Lei, and Fei Fang. 2022. Scenario-adaptive and self-supervised model for multi-scenario personalized recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 3674--3683.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence number 01. Vol. 33, 5941--5948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Guorui Zhou et al. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 1059--1068.Google ScholarGoogle Scholar
  42. Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-rec: self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management, 1893--1902.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-gate Mixture-of-Contrastive-Experts with Graph-based Gating Mechanism for TV Recommendation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
      October 2023
      5508 pages
      ISBN:9798400701245
      DOI:10.1145/3583780

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    • Article Metrics

      • Downloads (Last 12 months)153
      • Downloads (Last 6 weeks)25

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader