Abstract
Multi-task prediction in recommendation systems has garnered considerable attention, particularly with the success of Mixture of Experts (MoE) based models such as MMoE and PLE. In this paper, we first observed that many existing MoE-based models prioritize increasing model capacity for better online performance, resulting in redundant or similar expert hidden representations. This negatively impacts online performance and parameter utilization efficiency. To address this, we introduce self-supervised learning to improve alignment and uniformity of expert representation and propose a Contrastive Learning for MoE models (CMoE) framework consisting of two self-supervised learning signals: Experts Homogeneity Penalty(EHP), Expert Agreement Regularization (EAR). The EHP ensures distinct hidden representations for different experts, while the EAR enhances feature representation learning. We conducted experiments on real-world datasets for Click-Through Rate (CTR), Conversion Rate (CVR), and Deep Conversion Rate (DVR) prediction tasks. Results showed significant improvements, with a 1.24% increase in AUC compared to the baseline model. Online A/B tests also validated the approach, demonstrating a 3.27% enhancement in CTCVDVR and 3.42% in ARPU (Average Revenue Per User). The code has been available at https://github.com/BZX667/CMoE.
Z. Bai and K. Su–These authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1930–1939 (2018)
Pan, Y., Yao, J., Han, B., Jia, K., Zhang, Y., Yang, H.: Click-through rate prediction with auto-quantized contrastive learning. arXiv preprint arXiv:2109.13921 (2021)
Su, Y., Lan, T., Wang, Y., Yogatama, D., Kong, L., Collier, N.: A contrastive framework for neural text generation. Adv. Neural. Inf. Process. Syst. 35, 21548–21561 (2022)
Tang, H., Liu, J., Zhao, M., Gong, X.: Progressive layered extraction (PLE): a novel multi-task learning (MTL) model for personalized recommendations. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 269–278 (2020)
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, pp. 9929–9939. PMLR (2020)
Wen, H., Zhang, J., Wang, Y., Lv, F., Bao, W., Lin, Q., Yang, K.: Entire space multi-task modeling via post-click behavior decomposition for conversion rate prediction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2377–2386 (2020)
Xie, X., et al.: Contrastive learning for sequential recommendation. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 1259–1273. IEEE (2022)
Yao, T., et al.: Self-supervised learning for large-scale item recommendations. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4321–4330 (2021)
Yu, J., Yin, H., Xia, X., Chen, T., Cui, L., Nguyen, Q.V.H.: Are graph augmentations necessary? simple graph contrastive learning for recommendation. In: Proceedings of the 45th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1294–1303 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bai, Z., Su, K., Zhu, X., Xiong, Y. (2024). Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture-of-Experts Model. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14856. Springer, Singapore. https://doi.org/10.1007/978-981-97-5575-2_21
Download citation
DOI: https://doi.org/10.1007/978-981-97-5575-2_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5574-5
Online ISBN: 978-981-97-5575-2
eBook Packages: Computer ScienceComputer Science (R0)