Skip to main content

RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13937))

Included in the following conference series:

  • 611 Accesses

Abstract

There is a strong need for industrial recommender systems to output an integrated ranking of items from different categories, such as video and news, to maximize overall user satisfaction. Integrated ranking faces two critical challenges. First, there is no universal metric to evaluate the contribution of each item due to the huge discrepancies between items. Second, user’s short-term preference may shift fast between diverse items during her interaction with the recommender system. To address the above challenges, we propose a reinforcement learning (RL) based framework called RLMixer to approach the sequential integrated ranking problem. Benefiting from the credit assignment mechanism, RLMixer can decompose the overall user satisfaction to items of different categories, so that they are comparable. To capture the user’s short-term preference, RLMixer explicitly learns user interest vectors by a carefully designed contrastive loss. In addition, RLMixer is trained in a fully offline manner for the convenience in industrial applications. We show that RLMixer significantly outperforms various baselines on both public PRM datasets and industrial datasets collected from a widely used AppStore. We also conduct online A/B tests on millions of users through the AppStore. The results show that RLMixer brings over 4% significant revenue gain.

J. Wang and M. Zhao—The first two authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)

    Google Scholar 

  2. Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F., Chi, E.H.: Top-k off-policy correction for a reinforce recommender system. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 456–464 (2019)

    Google Scholar 

  3. Fu, M., Agrawal, A., Irissappane, A.A., Zhang, J., Huang, L., Qu, H.: Deep reinforcement learning framework for category-based item recommendation. IEEE Transactions on Cybernetics (2021)

    Google Scholar 

  4. Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062 (2019)

    Google Scholar 

  5. Geyik, S.C., Ambler, S., Kenthapadi, K.: Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In: Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining, pp. 2221–2231 (2019)

    Google Scholar 

  6. Guanjie, Z., et al.: Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 167–176 (2018)

    Google Scholar 

  7. Koutsopoulos, I.: Optimal advertisement allocation in online social media feeds. In: Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-Scale Mobile Computing and Online Social Networking, pp. 43–48 (2016)

    Google Scholar 

  8. Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S.: Stabilizing off-policy q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems, pp. 11761–11771 (2019)

    Google Scholar 

  9. Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1179–1191 (2020)

    Google Scholar 

  10. Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020)

  11. Liao, G., et al.: Cross dqn: Cross deep q network for ads allocation in feed. arXiv preprint arXiv:2109.04353 (2021)

  12. Matsushima, T., Furuta, H., Matsuo, Y., Nachum, O., Gu, S.S.: Deployment-efficient reinforcement learning via model-based offline optimization. In: International Conference on Learning Representations (2021)

    Google Scholar 

  13. Pei, C., et al.: Personalized re-ranking for recommendation. In: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 3–11 (2019)

    Google Scholar 

  14. Wu, Y., Tucker, G., Nachum, O.: Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019)

  15. Xiao, T., Wang, D.: A general offline reinforcement learning framework for interactive recommendation. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (2021)

    Google Scholar 

  16. Yan, J., Xu, Z., Tiwana, B., Chatterjee, S.: Ads allocation in feed via constrained optimization. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3386–3394 (2020)

    Google Scholar 

  17. Zhao, M., Li, Z., Bo, A., Haifeng, L., Yifan, Y., Chen, C.: Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3940–3946 (2018)

    Google Scholar 

  18. Zhao, X., Gu, C., Zhang, H., Yang, X., Liu, X., Tang, J., Liu, H.: Dear: Deep reinforcement learning for online advertising impression in recommender systems. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 750–758 (2021)

    Google Scholar 

  19. Zhou, G., et al.: Deep interest evolution network for click-through rate prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5941–5948 (2019)

    Google Scholar 

  20. Zhou, G., et al.: Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1059–1068 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangyong Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J. et al. (2023). RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13937. Springer, Cham. https://doi.org/10.1007/978-3-031-33380-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33380-4_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33379-8

  • Online ISBN: 978-3-031-33380-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics