Skip to main content

Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture-of-Experts Model

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2024)

Abstract

Multi-task prediction in recommendation systems has garnered considerable attention, particularly with the success of Mixture of Experts (MoE) based models such as MMoE and PLE. In this paper, we first observed that many existing MoE-based models prioritize increasing model capacity for better online performance, resulting in redundant or similar expert hidden representations. This negatively impacts online performance and parameter utilization efficiency. To address this, we introduce self-supervised learning to improve alignment and uniformity of expert representation and propose a Contrastive Learning for MoE models (CMoE) framework consisting of two self-supervised learning signals: Experts Homogeneity Penalty(EHP), Expert Agreement Regularization (EAR). The EHP ensures distinct hidden representations for different experts, while the EAR enhances feature representation learning. We conducted experiments on real-world datasets for Click-Through Rate (CTR), Conversion Rate (CVR), and Deep Conversion Rate (DVR) prediction tasks. Results showed significant improvements, with a 1.24% increase in AUC compared to the baseline model. Online A/B tests also validated the approach, demonstrating a 3.27% enhancement in CTCVDVR and 3.42% in ARPU (Average Revenue Per User). The code has been available at https://github.com/BZX667/CMoE.

Z. Bai and K. Su–These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  2. Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021)

  3. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  4. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)

    Article  Google Scholar 

  5. Ma, J., Zhao, Z., Yi, X., Chen, J., Hong, L., Chi, E.H.: Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1930–1939 (2018)

    Google Scholar 

  6. Pan, Y., Yao, J., Han, B., Jia, K., Zhang, Y., Yang, H.: Click-through rate prediction with auto-quantized contrastive learning. arXiv preprint arXiv:2109.13921 (2021)

  7. Su, Y., Lan, T., Wang, Y., Yogatama, D., Kong, L., Collier, N.: A contrastive framework for neural text generation. Adv. Neural. Inf. Process. Syst. 35, 21548–21561 (2022)

    Google Scholar 

  8. Tang, H., Liu, J., Zhao, M., Gong, X.: Progressive layered extraction (PLE): a novel multi-task learning (MTL) model for personalized recommendations. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 269–278 (2020)

    Google Scholar 

  9. Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, pp. 9929–9939. PMLR (2020)

    Google Scholar 

  10. Wen, H., Zhang, J., Wang, Y., Lv, F., Bao, W., Lin, Q., Yang, K.: Entire space multi-task modeling via post-click behavior decomposition for conversion rate prediction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2377–2386 (2020)

    Google Scholar 

  11. Xie, X., et al.: Contrastive learning for sequential recommendation. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 1259–1273. IEEE (2022)

    Google Scholar 

  12. Yao, T., et al.: Self-supervised learning for large-scale item recommendations. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4321–4330 (2021)

    Google Scholar 

  13. Yu, J., Yin, H., Xia, X., Chen, T., Cui, L., Nguyen, Q.V.H.: Are graph augmentations necessary? simple graph contrastive learning for recommendation. In: Proceedings of the 45th international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1294–1303 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xiong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bai, Z., Su, K., Zhu, X., Xiong, Y. (2024). Improving Alignment and Uniformity of Expert Representation with Contrastive Learning for Mixture-of-Experts Model. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14856. Springer, Singapore. https://doi.org/10.1007/978-981-97-5575-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5575-2_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5574-5

  • Online ISBN: 978-981-97-5575-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics