RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling

Wang, Jing; Zhao, Mengchen; Xia, Wei; Dong, Zhenhua; Tang, Ruiming; Zhang, Rui; Hao, Jianye; Chen, Guangyong; Heng, Pheng-Ann

doi:10.1007/978-3-031-33380-4_31

Jing Wang¹⁰,
Mengchen Zhao¹¹,
Wei Xia¹¹,
Zhenhua Dong¹¹,
Ruiming Tang¹¹,
Rui Zhang¹²,
Jianye Hao^11,13,
Guangyong Chen¹⁴ &
…
Pheng-Ann Heng¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13937))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

611 Accesses

Abstract

There is a strong need for industrial recommender systems to output an integrated ranking of items from different categories, such as video and news, to maximize overall user satisfaction. Integrated ranking faces two critical challenges. First, there is no universal metric to evaluate the contribution of each item due to the huge discrepancies between items. Second, user’s short-term preference may shift fast between diverse items during her interaction with the recommender system. To address the above challenges, we propose a reinforcement learning (RL) based framework called RLMixer to approach the sequential integrated ranking problem. Benefiting from the credit assignment mechanism, RLMixer can decompose the overall user satisfaction to items of different categories, so that they are comparable. To capture the user’s short-term preference, RLMixer explicitly learns user interest vectors by a carefully designed contrastive loss. In addition, RLMixer is trained in a fully offline manner for the convenience in industrial applications. We show that RLMixer significantly outperforms various baselines on both public PRM datasets and industrial datasets collected from a widely used AppStore. We also conduct online A/B tests on millions of users through the AppStore. The results show that RLMixer brings over 4% significant revenue gain.

J. Wang and M. Zhao—The first two authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)
Google Scholar
Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F., Chi, E.H.: Top-k off-policy correction for a reinforce recommender system. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 456–464 (2019)
Google Scholar
Fu, M., Agrawal, A., Irissappane, A.A., Zhang, J., Huang, L., Qu, H.: Deep reinforcement learning framework for category-based item recommendation. IEEE Transactions on Cybernetics (2021)
Google Scholar
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062 (2019)
Google Scholar
Geyik, S.C., Ambler, S., Kenthapadi, K.: Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In: Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining, pp. 2221–2231 (2019)
Google Scholar
Guanjie, Z., et al.: Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 167–176 (2018)
Google Scholar
Koutsopoulos, I.: Optimal advertisement allocation in online social media feeds. In: Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-Scale Mobile Computing and Online Social Networking, pp. 43–48 (2016)
Google Scholar
Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S.: Stabilizing off-policy q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems, pp. 11761–11771 (2019)
Google Scholar
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1179–1191 (2020)
Google Scholar
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020)
Liao, G., et al.: Cross dqn: Cross deep q network for ads allocation in feed. arXiv preprint arXiv:2109.04353 (2021)
Matsushima, T., Furuta, H., Matsuo, Y., Nachum, O., Gu, S.S.: Deployment-efficient reinforcement learning via model-based offline optimization. In: International Conference on Learning Representations (2021)
Google Scholar
Pei, C., et al.: Personalized re-ranking for recommendation. In: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 3–11 (2019)
Google Scholar
Wu, Y., Tucker, G., Nachum, O.: Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019)
Xiao, T., Wang, D.: A general offline reinforcement learning framework for interactive recommendation. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (2021)
Google Scholar
Yan, J., Xu, Z., Tiwana, B., Chatterjee, S.: Ads allocation in feed via constrained optimization. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3386–3394 (2020)
Google Scholar
Zhao, M., Li, Z., Bo, A., Haifeng, L., Yifan, Y., Chen, C.: Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3940–3946 (2018)
Google Scholar
Zhao, X., Gu, C., Zhang, H., Yang, X., Liu, X., Tang, J., Liu, H.: Dear: Deep reinforcement learning for online advertising impression in recommender systems. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 750–758 (2021)
Google Scholar
Zhou, G., et al.: Deep interest evolution network for click-through rate prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5941–5948 (2019)
Google Scholar
Zhou, G., et al.: Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1059–1068 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Ma Liu Shui, Hong Kong
Jing Wang & Pheng-Ann Heng
Huawei Noah’s Ark Lab, Quebec, Canada
Mengchen Zhao, Wei Xia, Zhenhua Dong, Ruiming Tang & Jianye Hao
Tsinghua University, Beijing, China
Rui Zhang
Tianjin University, Tianjin, China
Jianye Hao
Zhejiang Lab, Zhejiang, China
Guangyong Chen

Authors

Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mengchen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xia
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Dong
View author publications
You can also search for this author in PubMed Google Scholar
Ruiming Tang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianye Hao
View author publications
You can also search for this author in PubMed Google Scholar
Guangyong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pheng-Ann Heng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangyong Chen .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Hisashi Kashima
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Tsuyoshi Ide
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J. et al. (2023). RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13937. Springer, Cham. https://doi.org/10.1007/978-3-031-33380-4_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-33380-4_31
Published: 27 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33379-8
Online ISBN: 978-3-031-33380-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling