skip to main content
10.1145/3394486.3403374acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Bandit based Optimization of Multiple Objectives on a Music Streaming Platform

Published:20 August 2020Publication History

ABSTRACT

Recommender systems powering online multi-stakeholder platforms often face the challenge of jointly optimizing multiple objectives, in an attempt to efficiently match suppliers and consumers. Examples of such objectives include user behavioral metrics (e.g. clicks, streams, dwell time, etc), supplier exposure objectives (e.g. diversity) and platform centric objectives (e.g. promotions). Jointly optimizing multiple metrics in online recommender systems remains a challenging task. Recent work has demonstrated the prowess of contextual bandits in powering recommendation systems to serve recommendation of interest to users. This paper aims at extending contextual bandits to multi-objective setting so as to power recommendations in a multi-stakeholder platforms.

Specifically, in a contextual bandit setting, we learn a recommendation policy that can optimize multiple objectives simultaneously in a fair way. This multi-objective online optimization problem is formalized by using the Generalized Gini index (GGI) aggregation function, which combines and balances multiple objectives together. We propose an online gradient ascent learning algorithm to maximise the long-term vectorial rewards for different objectives scalarised using the GGI function. Through extensive experiments on simulated data and large scale music recommendation data from Spotify, a streaming platform, we show that the proposed algorithm learns a superior policy among the disparate objectives compared with other state-of-the-art approaches.

References

  1. Gediminas Adomavicius and YoungOk Kwon. 2007. New Recommendation Techniques for Multicriteria Rating Systems. IEEE Intelligent Systems (2007).Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Deepak Agarwal, Bee-Chung, Pradheep, and Xuanhui. 2011. Click shaping to optimize multiple objectives. In Proceedings of KDD 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, and Xuanhui Wang. 2012. Personalized click shaping through lagrangian duality for online recommendation. In Proceedings of SIGIR 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Leon Barrett and Srini Narayanan. 2008. Learning All Optimal Policies with Multiple Criteria. In ICML. 41--47.Google ScholarGoogle Scholar
  5. Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, and Shie Mannor. 2017. Multi-objective Bandits: Optimizing the Generalized Gini Index. In ICML.Google ScholarGoogle Scholar
  6. Konstantina Christakopoulou, Jaya Kawale, and Arindam Banerjee. Recommendation with capacity constraints. In Proceedings of CIKM 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Wei Chu and Seung-Taek Park. 2009. Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models. In WWW.Google ScholarGoogle Scholar
  8. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In RecSys.Google ScholarGoogle Scholar
  9. M. M. Drugan and A. Nowe. 2013. Designing multi-objective multi-armed bandits algorithms: A study. In International Joint Conference on Neural Networks (IJCNN).Google ScholarGoogle Scholar
  10. Zoltán Gábor, Zsolt Kalmár, and Csaba Szepesvári. 1998. Multi-criteria Reinforcement Learning. In ICML. 197--205.Google ScholarGoogle Scholar
  11. Rupesh Gupta, Guanfeng Liang, Ravi Kiran Tseng, Xiaoyu Chen, and Romer Rosales. Email volume optimization at LinkedIn. In KDD 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tamas Jambor and Jun Wang. Optimizing multiple objectives in collaborative filtering. In Proceedings of RecSys 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 306--310.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 133--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Y. Kim and O. L. de Weck. 2006. Adaptive weighted sum method for multiobjective optimization: a new method for Pareto front generation. Structural and Multidisciplinary Optimization, Vol. 31, 2 (2006), 105--116.Google ScholarGoogle ScholarCross RefCross Ref
  16. Anisio Lacerda. 2015. Contextual Bandits for Multi-objective Recommender Systems. In Proceedings of the 2015 Brazilian Conference on Intelligent Systems (BRACIS). 68--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kleanthi Lakiotaki, Nikolaos F Matsatsinis, and Alexis Tsoukias. 2011. Multicriteria user modeling in recommender systems. IEEE Intelligent Systems (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. John Langford and Tong Zhang. 2008. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information. NIPS.Google ScholarGoogle Scholar
  19. Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A Contextual-bandit Approach to Personalized News Article Recommendation. In WWW.Google ScholarGoogle Scholar
  20. Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms. In WSDM.Google ScholarGoogle Scholar
  21. C. Liu, X. Xu, and D. Hu. 2015. Multiobjective Reinforcement Learning: A Comprehensive Overview. IEEE Transactions on Systems, Man, and Cybernetics (2015).Google ScholarGoogle Scholar
  22. Donald W. Marquardt and Ronald D. Snee. 1975. Ridge Regression in Practice. The American Statistician, Vol. 29, 1 (1975), 3--20.Google ScholarGoogle Scholar
  23. James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits. In Proceedings of RecSys 2018.Google ScholarGoogle Scholar
  24. Rishabh Mehrotra and Benjamin Carterette. 2019. Recommendations in a marketplace. In Proceedings of the 13th ACM Conference on Recommender Systems. 580--581.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rishabh Mehrotra, James McInerney, Hugues Bouchard, Mounia Lalmas, and Fernando Diaz. 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. In CIKM.Google ScholarGoogle Scholar
  26. K. Van Moffaert, K. Van Vaerenbergh, P. Vrancx, and A. Nowe.Google ScholarGoogle Scholar
  27. Eric Moulines and Francis R. Bach. 2011. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning. NIPS.Google ScholarGoogle Scholar
  28. Wodzimierz Ogryczak and Tomasz Sliwinski. 2003. On solving linear programs with the ordered weighted averaging objective. European Journal of Operational Research, Vol. 148, 1 (2003), 80--91.Google ScholarGoogle ScholarCross RefCross Ref
  29. Saba Q. Yahyaa, Madalina M. Drugan, and Bernard Manderick. 2014. Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms. In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1.Google ScholarGoogle Scholar
  30. Marco Tulio Ribeiro, Lacerda, Veloso, and Ziviani. Pareto-efficient hybridization for multi-objective recommender systems. In RecSys 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The Annals of Mathematical Statistics, Vol. 22, 3 (1951), 400--407.Google ScholarGoogle ScholarCross RefCross Ref
  32. Lili Shan, Lei Lin, and Chengjie Sun. Combined Regression and Tripletwise Learning for Conversion Rate Prediction in Real-Time Bidding Advertising. In SIGIR 2018.Google ScholarGoogle Scholar
  33. Aleksandrs Slivkins. 2014. Contextual Bandits with Similarity Information. J. Mach. Learn. Res., Vol. 15, 1 (2014), 2533--2568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Richard S. Sutton and Francis Bach. 1998. Reinforcement Learning: An Introduction. MIT Press.Google ScholarGoogle Scholar
  35. Krysta M Svore, Maksims N Volkovs, and Christopher JC Burges. Learning to rank with multiple objective functions. In Proceedings of WWW 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Cem Tekin and Eralp Turug ay. 2018. Multi-objective contextual multi-armed bandit with a dominant objective. IEEE Transactions on Signal Processing (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Eralp Turug ay, Doruk Öner, and Cem Tekin. 2018. Multi-objective contextual bandit problem with similarity information. arXiv preprint arXiv:1803.04015 (2018).Google ScholarGoogle Scholar
  38. Umair ul Hassan and Edward Curry. 2016. Efficient task assignment for spatial crowdsourcing: A combinatorial fractional optimization approach with semi-bandit learning. Expert Systems with Applications, Vol. 58 (2016), 36--56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. John A. Weymark. 1981. Generalized Gini inequality indices. Mathematical Social Sciences, Vol. 1 (1981), 409--430.Google ScholarGoogle ScholarCross RefCross Ref
  40. Lin Xiao, Minand, Zhaoquan, L Yiqun, and Ma Shaoping. Fairness-aware group recommendation with pareto-efficiency. In RecSys 2018.Google ScholarGoogle Scholar
  41. R. R. Yager. On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man, & Cybernetics ( ????).Google ScholarGoogle Scholar
  42. Saba Yahyaa, Madalina Drugan, and Bernard Manderick. 2015. Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit. In Proceedings of the 7th International Conference on Agents and Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xing Yi, Liangjie Hong, Erheng Zhong, Nanthan Nan Liu, and Suju Rajan. Beyond clicks: dwell time for personalization. In Proceedings of RecSys 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Chongjie Zhang and Julie A Shah. 2014. Fairness in Multi-Agent Sequential Decision-Making. NIPS. 2636--2644.Google ScholarGoogle Scholar

Index Terms

  1. Bandit based Optimization of Multiple Objectives on a Music Streaming Platform

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          August 2020
          3664 pages
          ISBN:9781450379984
          DOI:10.1145/3394486

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 August 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader