skip to main content
10.1145/3308558.3313592acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Deriving User- and Content-specific Rewards for Contextual Bandits

Published: 13 May 2019 Publication History

Abstract

Bandit algorithms have gained increased attention in recommender systems, as they provide effective and scalable recommendations. These algorithms use reward functions, usually based on a numeric variable such as click-through rates, as the basis for optimization. On a popular music streaming service, a contextual bandit algorithm is used to decide which content to recommend to users, where the reward function is a binarization of a numeric variable that defines success based on a static threshold of user streaming time: 1 if the user streamed for at least 30 seconds and 0 otherwise. We explore alternative methods to provide a more informed reward function, based on the assumptions that streaming time distribution heavily depends on the type of user and the type of content being streamed. To automatically extract user and content groups from streaming data, we employ ”co-clustering”, an unsupervised learning technique to simultaneously extract clusters of rows and columns from a co-occurrence matrix. The streaming distributions within the co-clusters are then used to define rewards specific to each co-cluster. Our proposed co-clustered based reward functions lead to improvement of over 25% in expected stream rate, compared to the standard binarized rewards.

References

[1]
Nicola Barbieri, Fabrizio Silvestri, and Mounia Lalmas. 2016. Improving post-click user engagement on native ads via survival analysis. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 761-770.
[2]
Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD '18). ACM, New York, NY, USA, 1187-1196.
[3]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785-794.
[4]
Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo Ha, and Sungroh Yoon. 2018. Reinforcement Learning based Recommender System using Biclustering Technique. In WSDM Workshop on Multi-dimensional Information Fusion for User Modeling and Personalization.
[5]
Thomas M Cover and Joy A Thomas. 2012. Elements of information theory. John Wiley & Sons.
[6]
Inderjit S Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 269-274.
[7]
Inderjit S Dhillon, Subramanyam Mallela, and Dharmendra S Modha. 2003. Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 89-98.
[8]
Jean Garcia-Gathright, Brian St Thomas, Christine Hosey, Zahra Nazari, and Fernando Diaz. 2018. Understanding and Evaluating User Satisfaction with Music Discovery. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 55-64.
[9]
Thomas George and Srujana Merugu. 2005. A scalable collaborative filtering framework based on co-clustering. In Data Mining, Fifth IEEE international conference on. IEEE, 4-pp.
[10]
Alexandre Gilotte, Cle´ment Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dolle´. 2018. Offline A/B testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 198-206.
[11]
Qi Guo and Eugene Agichtein. 2012. Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior. In Proceedings of the 21st international conference on World Wide Web. ACM, 569-578.
[12]
Ahmed Hassan, Xiaolin Shi, Nick Craswell, and Bill Ramsey. {n. d.}. Beyond clicks: query reformulation as a predictor of search satisfaction. In CIKM 2013.
[13]
Youngho Kim, Ahmed Hassan, Ryen W White, and Imed Zitouni. 2014. Comparing client and server dwell time estimates for click-level satisfaction prediction. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 895-898.
[14]
Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. 2914. Matroid Bandits: Fast Combinatorial Optimization with Learning. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence(UAI'14). AUAI Press, Arlington, Virginia, United States, 420-429. http://dl.acm.org/citation.cfm?id=3020751.3020795
[15]
Dmitry Lagun and Mounia Lalmas. 2016. Understanding user attention and engagement in online news reading. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 113-122.
[16]
Kenneth Wai-Ting Leung, Dik Lun Lee, and Wang-Chien Lee. 2011. CLR: a collaborative location recommendation framework based on co-clustering. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 305-314.
[17]
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297-306.
[18]
Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 539-548.
[19]
Yiqun Liu, Ye Chen, Jinhui Tang, Jiashen Sun, Min Zhang, Shaoping Ma, and Xuan Zhu. 2015. Different users, different opinions: Predicting search satisfaction with mouse movement information. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 493-502.
[20]
James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. 2018. Explore, exploit, and explain: personalizing explainable recommendations with bandits. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. 31-39.
[21]
Rishabh Mehrotra, Ahmed Hassan Awadallah, Milad Shokouhi, Emine Yilmaz, Imed Zitouni, Ahmed El Kholy, and Madian Khabsa. 2017. Deep Sequential Models for Task Satisfaction Prediction. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 737-746.
[22]
Thomas Nedelec, Nicolas Le Roux, and Vianney Perchet. 2017. A comparative study of counterfactual estimators. arXiv preprint arXiv:1704.00773(2017).
[23]
Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning Diverse Rankings with Multi-armed Bandits. In Proceedings of the 25th International Conference on Machine Learning(ICML '08). ACM, New York, NY, USA, 784-791.
[24]
Hanhuai Shan and Arindam Banerjee. 2008. Bayesian co-clustering. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 530-539.
[25]
Ning Su, Jiyin He, Yiqun Liu, Min Zhang, and Shaoping Ma. 2018. User Intent, Behaviour, and Perceived Satisfaction in Product Search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining(WSDM '18). ACM, New York, NY, USA, 547-555.
[26]
Gabriele Tolomei, Mounia Lalmas, Ayman Farahat, and Andrew Haines. 2018. You must have clicked on this ad by mistake! Data-driven identification of accidental clicks on mobile ads with applications to advertiser cost discounting and click-through rate prediction. International Journal of Data Science and Analytics (2018).
[27]
Pu Wang, Carlotta Domeniconi, and Kathryn Blackmond Laskey. 2009. Latent dirichlet bayesian co-clustering. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 522-537.
[28]
Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. 2014. Beyond clicks: dwell time for personalization. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 113-120.
[29]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD '18). ACM, New York, NY, USA, 1040-1048.

Cited By

View all
  • (2024)Guided SQL-Based Data Exploration with User Feedback2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00372(4884-4896)Online publication date: 13-May-2024
  • (2024)Quantum contextual bandits and recommender systems for quantum dataQuantum Machine Intelligence10.1007/s42484-024-00189-66:2Online publication date: 12-Sep-2024
  • (2023)Intent-Satisfaction Modeling: From Music to Video StreamingACM Transactions on Recommender Systems10.1145/36063751:3(1-23)Online publication date: 7-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)16
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Guided SQL-Based Data Exploration with User Feedback2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00372(4884-4896)Online publication date: 13-May-2024
  • (2024)Quantum contextual bandits and recommender systems for quantum dataQuantum Machine Intelligence10.1007/s42484-024-00189-66:2Online publication date: 12-Sep-2024
  • (2023)Intent-Satisfaction Modeling: From Music to Video StreamingACM Transactions on Recommender Systems10.1145/36063751:3(1-23)Online publication date: 7-Aug-2023
  • (2023)Leveling Up the Peloton Homescreen: A System and Algorithm for Dynamic Row RankingProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3610247(1062-1066)Online publication date: 14-Sep-2023
  • (2020)Deconfounding User Satisfaction Estimation from Response Rate BiasProceedings of the 14th ACM Conference on Recommender Systems10.1145/3383313.3412208(450-455)Online publication date: 22-Sep-2020
  • (2020)Learning to Rank in the Position Based Model with Bandit FeedbackProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412723(2405-2412)Online publication date: 19-Oct-2020
  • (2019)Recommendations in a marketplaceProceedings of the 13th ACM Conference on Recommender Systems10.1145/3298689.3346952(580-581)Online publication date: 10-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media