research-article

Deriving User- and Content-specific Rewards for Contextual Bandits

Authors:

Rishabh Mehrotra,

Mounia LalmasAuthors Info & Claims

WWW '19: The World Wide Web Conference

Pages 2680 - 2686

https://doi.org/10.1145/3308558.3313592

Published: 13 May 2019 Publication History

Abstract

Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.

References

[1]

Nicola Barbieri, Fabrizio Silvestri, and Mounia Lalmas. 2016. Improving post-click user engagement on native ads via survival analysis. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 761-770.

Digital Library

[2]

Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD '18). ACM, New York, NY, USA, 1187-1196.

Digital Library

[3]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785-794.

Digital Library

[4]

Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo Ha, and Sungroh Yoon. 2018. Reinforcement Learning based Recommender System using Biclustering Technique. In WSDM Workshop on Multi-dimensional Information Fusion for User Modeling and Personalization.

[5]

Thomas M Cover and Joy A Thomas. 2012. Elements of information theory. John Wiley & Sons.

[6]

Inderjit S Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 269-274.

Digital Library

[7]

Inderjit S Dhillon, Subramanyam Mallela, and Dharmendra S Modha. 2003. Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 89-98.

Digital Library

[8]

Jean Garcia-Gathright, Brian St Thomas, Christine Hosey, Zahra Nazari, and Fernando Diaz. 2018. Understanding and Evaluating User Satisfaction with Music Discovery. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 55-64.

Digital Library

[9]

Thomas George and Srujana Merugu. 2005. A scalable collaborative filtering framework based on co-clustering. In Data Mining, Fifth IEEE international conference on. IEEE, 4-pp.

Digital Library

[10]

Alexandre Gilotte, Cle´ment Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dolle´. 2018. Offline A/B testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 198-206.

Digital Library

[11]

Qi Guo and Eugene Agichtein. 2012. Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior. In Proceedings of the 21st international conference on World Wide Web. ACM, 569-578.

Digital Library

[12]

Ahmed Hassan, Xiaolin Shi, Nick Craswell, and Bill Ramsey. {n. d.}. Beyond clicks: query reformulation as a predictor of search satisfaction. In CIKM 2013.

Digital Library

[13]

Youngho Kim, Ahmed Hassan, Ryen W White, and Imed Zitouni. 2014. Comparing client and server dwell time estimates for click-level satisfaction prediction. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 895-898.

Digital Library

[14]

Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. 2914. Matroid Bandits: Fast Combinatorial Optimization with Learning. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence(UAI'14). AUAI Press, Arlington, Virginia, United States, 420-429. http://dl.acm.org/citation.cfm?id=3020751.3020795

Digital Library

[15]

Dmitry Lagun and Mounia Lalmas. 2016. Understanding user attention and engagement in online news reading. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 113-122.

Digital Library

[16]

Kenneth Wai-Ting Leung, Dik Lun Lee, and Wang-Chien Lee. 2011. CLR: a collaborative location recommendation framework based on co-clustering. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 305-314.

Digital Library

[17]

Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297-306.

Digital Library

[18]

Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 539-548.

Digital Library

[19]

Yiqun Liu, Ye Chen, Jinhui Tang, Jiashen Sun, Min Zhang, Shaoping Ma, and Xuan Zhu. 2015. Different users, different opinions: Predicting search satisfaction with mouse movement information. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 493-502.

Digital Library

[20]

James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. 2018. Explore, exploit, and explain: personalizing explainable recommendations with bandits. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. 31-39.

Digital Library

[21]

Rishabh Mehrotra, Ahmed Hassan Awadallah, Milad Shokouhi, Emine Yilmaz, Imed Zitouni, Ahmed El Kholy, and Madian Khabsa. 2017. Deep Sequential Models for Task Satisfaction Prediction. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 737-746.

Digital Library

[22]

Thomas Nedelec, Nicolas Le Roux, and Vianney Perchet. 2017. A comparative study of counterfactual estimators. arXiv preprint arXiv:1704.00773(2017).

[23]

Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning Diverse Rankings with Multi-armed Bandits. In Proceedings of the 25th International Conference on Machine Learning(ICML '08). ACM, New York, NY, USA, 784-791.

Digital Library

[24]

Hanhuai Shan and Arindam Banerjee. 2008. Bayesian co-clustering. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 530-539.

Digital Library

[25]

Ning Su, Jiyin He, Yiqun Liu, Min Zhang, and Shaoping Ma. 2018. User Intent, Behaviour, and Perceived Satisfaction in Product Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(WSDM '18). ACM, New York, NY, USA, 547-555.

Digital Library

[26]

Gabriele Tolomei, Mounia Lalmas, Ayman Farahat, and Andrew Haines. 2018. You must have clicked on this ad by mistake! Data-driven identification of accidental clicks on mobile ads with applications to advertiser cost discounting and click-through rate prediction. International Journal of Data Science and Analytics (2018).

[27]

Pu Wang, Carlotta Domeniconi, and Kathryn Blackmond Laskey. 2009. Latent dirichlet bayesian co-clustering. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 522-537.

[28]

Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. 2014. Beyond clicks: dwell time for personalization. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 113-120.

Digital Library

[29]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD '18). ACM, New York, NY, USA, 1040-1048.

Digital Library

Cited By

Mandamadiotis AKoutrika GAmer-Yahia S(2024)Guided SQL-Based Data Exploration with User Feedback2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00372(4884-4896)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00372
Brahmachari SLumbreras JTomamichel M(2024)Quantum contextual bandits and recommender systems for quantum dataQuantum Machine Intelligence10.1007/s42484-024-00189-66:2Online publication date: 12-Sep-2024
https://doi.org/10.1007/s42484-024-00189-6
Bénédict GOdijk Dde Rijke M(2023)Intent-Satisfaction Modeling: From Music to Video StreamingACM Transactions on Recommender Systems10.1145/36063751:3(1-23)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3606375
Show More Cited By

Recommendations

Semi-Regenerative Processes with Unbounded Rewards

A semi-regenerative process SRP is combined with a reward structure such that the accumulated reward during [0, t] is the sum of a functional of the SRP and a functional of the embedded Markov renewal process MRP. For the expected discounted return a ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...
Thresholded rewards: acting optimally in timed, zero-sum games
AAAI'07: Proceedings of the 22nd national conference on Artificial intelligence - Volume 2

In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermediate reward to be the difference between our score and our opponent's score;...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '19: The World Wide Web Conference

May 2019

3620 pages

ISBN:9781450366748

DOI:10.1145/3308558

Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '19

WWW '19: The Web Conference

May 13 - 17, 2019

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
501
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)16

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mandamadiotis AKoutrika GAmer-Yahia S(2024)Guided SQL-Based Data Exploration with User Feedback2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00372(4884-4896)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00372
Brahmachari SLumbreras JTomamichel M(2024)Quantum contextual bandits and recommender systems for quantum dataQuantum Machine Intelligence10.1007/s42484-024-00189-66:2Online publication date: 12-Sep-2024
https://doi.org/10.1007/s42484-024-00189-6
Bénédict GOdijk Dde Rijke M(2023)Intent-Satisfaction Modeling: From Music to Video StreamingACM Transactions on Recommender Systems10.1145/36063751:3(1-23)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3606375
Chen NMeetei OTalukder NZankevich A(2023)Leveling Up the Peloton Homescreen: A System and Algorithm for Dynamic Row RankingProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3610247(1062-1066)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3610247
Christakopoulou KTraverse MPotter TMarriott ELi DHaulk CChi EChen M(2020)Deconfounding User Satisfaction Estimation from Response Rate BiasProceedings of the 14th ACM Conference on Recommender Systems10.1145/3383313.3412208(450-455)Online publication date: 22-Sep-2020
https://dl.acm.org/doi/10.1145/3383313.3412208
Ermis BErnst PStein YZappella Gd'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Learning to Rank in the Position Based Model with Bandit FeedbackProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412723(2405-2412)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3412723
Mehrotra RCarterette BBogers TSaid ABrusilovsky PTikk D(2019)Recommendations in a marketplaceProceedings of the 13th ACM Conference on Recommender Systems10.1145/3298689.3346952(580-581)Online publication date: 10-Sep-2019
https://dl.acm.org/doi/10.1145/3298689.3346952

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten