Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture-of-Experts Model

Bai, Zhuoxi; Su, Kuo; Zhu, Xinyi; Xiong, Yun

doi:10.1007/978-981-97-5575-2_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14856))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

719 Accesses

Abstract

Multi-task prediction in recommendation systems has garnered considerable attention, particularly with the success of Mixture of Experts (MoE) based models such as MMoE and PLE. In this paper, we first observed that many existing MoE-based models prioritize increasing model capacity for better online performance, resulting in redundant or similar expert hidden representations. This negatively impacts online performance and parameter utilization efficiency. To address this, we introduce self-supervised learning to improve alignment and uniformity of expert representation and propose a Contrastive Learning for MoE models (CMoE) framework consisting of two self-supervised learning signals: Experts Homogeneity Penalty(EHP), Expert Agreement Regularization (EAR). The EHP ensures distinct hidden representations for different experts, while the EAR enhances feature representation learning. We conducted experiments on real-world datasets for Click-Through Rate (CTR), Conversion Rate (CVR), and Deep Conversion Rate (DVR) prediction tasks. Results showed significant improvements, with a 1.24% increase in AUC compared to the baseline model. Online A/B tests also validated the approach, demonstrating a 3.27% enhancement in CTCVDVR and 3.42% in ARPU (Average Revenue Per User). The code has been available at https://github.com/BZX667/CMoE.

Z. Bai and K. Su–These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Mixture of Task-Intensive Experts for Multi-task Recommendation

Countering Mainstream Bias via End-to-End Adaptive Local Learning

Multi-view improved sequence behavior with adaptive multi-task learning in ranking

Article 06 October 2022

References

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1930–1939 (2018)
Google Scholar
Pan, Y., Yao, J., Han, B., Jia, K., Zhang, Y., Yang, H.: Click-through rate prediction with auto-quantized contrastive learning. arXiv preprint arXiv:2109.13921 (2021)
Su, Y., Lan, T., Wang, Y., Yogatama, D., Kong, L., Collier, N.: A contrastive framework for neural text generation. Adv. Neural. Inf. Process. Syst. 35, 21548–21561 (2022)
Google Scholar
Tang, H., Liu, J., Zhao, M., Gong, X.: Progressive layered extraction (PLE): a novel multi-task learning (MTL) model for personalized recommendations. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 269–278 (2020)
Google Scholar
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, pp. 9929–9939. PMLR (2020)
Google Scholar
Wen, H., Zhang, J., Wang, Y., Lv, F., Bao, W., Lin, Q., Yang, K.: Entire space multi-task modeling via post-click behavior decomposition for conversion rate prediction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2377–2386 (2020)
Google Scholar
Xie, X., et al.: Contrastive learning for sequential recommendation. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 1259–1273. IEEE (2022)
Google Scholar
Yao, T., et al.: Self-supervised learning for large-scale item recommendations. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4321–4330 (2021)
Google Scholar
Yu, J., Yin, H., Xia, X., Chen, T., Cui, L., Nguyen, Q.V.H.: Are graph augmentations necessary? simple graph contrastive learning for recommendation. In: Proceedings of the 45th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1294–1303 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Cashcat Tech, Shanghai, China
Zhuoxi Bai
SAIC Motor Z-ONE Software, Shanghai, China
Kuo Su
Shanghai University, Shanghai, China
Xinyi Zhu
Fudan University, Shanghai, China
Yun Xiong

Authors

Zhuoxi Bai
View author publications
You can also search for this author in PubMed Google Scholar
Kuo Su
View author publications
You can also search for this author in PubMed Google Scholar
Xinyi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yun Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Xiong .

Editor information

Editors and Affiliations

Osaka University, Suita, Osaka, Japan
Makoto Onizuka
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
Beihang University, Beijing, China
Yongxin Tong
Osaka University, Osaka, Japan
Chuan Xiao
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
University of Grenoble Alpes, Saint-Martin d’Hères, France
Sihem Amer-Yahia
University of Michigan, Ann Arbor, MI, USA
H. V. Jagadish
Nagoya University, Nagoya, Japan
Kejing Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, Z., Su, K., Zhu, X., Xiong, Y. (2024). Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture-of-Experts Model. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14856. Springer, Singapore. https://doi.org/10.1007/978-981-97-5575-2_21

Download citation

DOI: https://doi.org/10.1007/978-981-97-5575-2_21
Published: 02 September 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5574-5
Online ISBN: 978-981-97-5575-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture-of-Experts Model