Abstract
This work presents an extension of Thompson Sampling bandit policy for orchestrating the collection of base recommendation algorithms for e-commerce. We focus on the problem of item-to-item recommendations, for which multiple behavioral and attribute-based predictors are provided to an ensemble learner. In addition, we detail the construction of a personalized predictor based on k-Nearest Neighbors (kNN), with temporal decay capabilities and event weighting. We show how to adapt Thompson Sampling to realistic situations when neither action availability nor reward stationarity is guaranteed. Furthermore, we investigate the effects of priming the sampler with pre-set parameters of reward probability distributions by utilizing the product catalog and/or event history, when such information is available. We report our experimental results based on the analysis of three real-world e-commerce datasets.
- Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, and Raghu Ramakrishnan. 2013. Content recommendation on web portals. Commun. ACM 56, 6 (June 2013), 92--101. Google ScholarDigital Library
- Eric Andrews. 2015. Recommender Systems for Online Dating. Master’s thesis. University of Helsinki.Google Scholar
- Jean-Yves Audibert and Sébastien Bubeck. 2010. Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 (Dec. 2010), 2785--2836. Google ScholarDigital Library
- Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 2--3 (May 2002), 235--256. Google ScholarDigital Library
- Viktor A. Barger and Lauren Labrecque. 2013. An integrated marketing communications perspective on social media metrics. International Journal of Integrated Marketing Communications (May 2013), 31.Google Scholar
- David Ben-Shimon, Alexander Tsikinovsky, Michael Friedmann, Bracha Shapira, Lior Rokach, and Johannes Hoerle. 2015. RecSys challenge 2015 and the YOOCHOOSE dataset. In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys’15). ACM, 357--358. Google ScholarDigital Library
- Lucas Bernardi, Jaap Kamps, Julia Kiseleva, and Melanie J. I. Müller. 2015. The continuous cold start problem in e-commerce recommender systems. CoRR abs/1508.01177 (June 2015).Google Scholar
- Djallel Bouneffouf and Raphael Féraud. 2016. Multi-armed bandit problem with known trend. Neurocomput. 205, C (Sept. 2016), 16--21. Google ScholarDigital Library
- Djallel Bouneffouf, Romain Laroche, Tanguy Urvoy, Raphael Feraud, and Robin Allesiardo. 2014. Contextual bandit for active learning: Active Thompson Sampling. In Neural Information Processing: 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3--6, 2014. Proceedings, Part I, Chu Kiong Loo, Keem Siah Yap, Kok Wai Wong, Andrew Teoh, and Kaizhu Huang (Eds.). Springer International Publishing, 405--412.Google Scholar
- Björn Brodén, Mikael Hammar, Bengt J. Nilsson, and Dimitris Paraschakis. 2018. Ensemble recommendations via Thompson Sampling: An experimental study within e-commerce. In Proceedings of the 23rd International Conference on Intelligent User Interfaces (IUI’18). ACM, 19--29. Google ScholarDigital Library
- Giuseppe Burtini, Jason Loeppky, and Ramon Lawrence. 2015. Improving online marketing experiments with drifting multi-armed bandits. In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1 (ICEIS 2015). SCITEPRESS - Science and Technology Publications, Lda, 630--636. Google ScholarDigital Library
- Stéphane Caron and Smriti Bhagat. 2013. Mixing bandits: A recipe for improved cold-start recommendations in a social network. In Proceedings of the 7th Workshop on Social Network Mining and Analysis (SNAKDD’13). ACM, Article 11, 9 pages. Google ScholarDigital Library
- Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Yin, Yi Chang, Mark A. Hasegawa-Johnson, and Thomas S. Huang. 2017. Streaming recommender systems. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). International World Wide Web Conferences Steering Committee, 381--389. Google ScholarDigital Library
- Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of Thompson Sampling. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2249--2257. Google ScholarDigital Library
- Sotirios P. Chatzis, Panayiotis Christodoulou, and Andreas S. Andreou. 2017. Recurrent latent variable networks for session-based recommendation. In Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems (DLRS 2017). ACM, 38--45. Google ScholarDigital Library
- Crícia Z. Felício, Klérisson V. R. Paixão, Celia A. Z. Barcelos, and Philippe Preux. 2017. A multi-armed bandit model selection for cold-start user recommendation. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (UMAP’17). ACM, 32--40. Google ScholarDigital Library
- Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th Annual Conference on Learning Theory. 359--376.Google Scholar
- Aurélien Garivier and Eric Moulines. 2011. On upper-confidence bound policies for switching bandit problems. In Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT’11). Springer-Verlag, 174--188. Google ScholarDigital Library
- Aditya Gopalan, Shie Mannor, and Yishay Mansour. 2014. Thompson sampling for complex online problems. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). JMLR.org, I--100--I--108. Google ScholarDigital Library
- Negar Hariri, Bamshad Mobasher, and Robin Burke. 2015. Adapting to user preference changes in interactive recommendation. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 4268--4274. Google ScholarDigital Library
- Cédric Hartland, Sylvain Gelly, Nicolas Baskiotis, Olivier Teytaud, and Michéle Sebag. 2006. Multi-armed bandit, dynamic environments and meta-bandits. In Online Trading of Exploration and Exploitation, NIPS 2006 Workshop.Google Scholar
- Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel recurrent neural network architectures for feature-rich session-based recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys’16). ACM, 241--248. Google ScholarDigital Library
- Balázs Hidasi and Domonkos Tikk. 2013. Context-aware item-to-item recommendation within the factorization framework. In Proceedings of the 3rd Workshop on Context-awareness in Retrieval and Recommendation (CaRR’13). ACM, 19--25. Google ScholarDigital Library
- Chu-Cheng Hsieh, James Neufeld, Tracy King, and Junghoo Cho. 2015. Efficient approximate thompson sampling for search query recommendation. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC’15). ACM, 740--746. Google ScholarDigital Library
- Dietmar Jannach and Malte Ludewig. 2017. When recurrent neural networks meet the neighborhood for session-based recommendation. In Proceedings of the 11th ACM Conference on Recommender Systems (RecSys’17). ACM, 306--310. Google ScholarDigital Library
- Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-based item recommendation in e-commerce: On short-term intents, reminders, trends and discounts. User Modeling and User-Adapted Interaction 27, 3 (01 Dec 2017), 351--392. Google ScholarDigital Library
- Iman Kamehkhosh, Dietmar Jannach, and Malte Ludewig. 2017. A comparison of frequent pattern techniques and a deep learning method for session-based recommendation. In TempRec Workshop at ACM RecSys.Google Scholar
- Varun Kanade, Brendan Mcmahan, and Bryan Brent. 2009. Sleeping experts and bandits with stochastic action availability and adversarial rewards. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, number 5. 272--279.Google Scholar
- Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. 2012. Thompson Sampling: An asymptotically optimal finite-time analysis. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory (ALT’12). Springer-Verlag, 199--213. Google ScholarDigital Library
- Jaya Kawale, Hung Bui, Branislav Kveton, Long Tran Thanh, and Sanjay Chawla. 2015. Efficient Thompson Sampling for online matrix-factorization recommendation. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1. MIT Press, 1297--1305. Google ScholarDigital Library
- Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. 2010. Regret bounds for sleeping experts and bandits. Mach. Learn. 80, 2--3 (Sept. 2010), 245--272. Google ScholarDigital Library
- Tomáš Kocák, Michal Valko, Rémi Munos, and Shipra Agrawal. 2014. Spectral Thompson Sampling. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI’14). AAAI Press, 1911--1917. Google ScholarDigital Library
- Andrea E. Kohlhase and Michael Kohlhase. 2009. Semantic transparency in user assistance systems. In Proceedings of the 27th ACM International Conference on Design of Communication (SIGDOC’09). ACM, 89--96. Google ScholarDigital Library
- Yehuda Koren and Robert Bell. 2011. Advances in Collaborative Filtering. Springer US, 145--186.Google Scholar
- Branislav Kveton, Csaba Szepesvári, Zheng Wen, and Azin Ashkan. 2015. Cascading bandits: Learning to rank in the cascade model. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML’15). JMLR.org, 767--776. Google ScholarDigital Library
- Anisio Lacerda. 2017. Multi-objective ranked bandits for recommender systems. Neurocomput. 246, C (July 2017), 12--24. Google ScholarDigital Library
- T. L. Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 1 (March 1985), 4--22. Google ScholarDigital Library
- Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management (CIKM’17). ACM, 1419--1428. Google ScholarDigital Library
- Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, 661--670. Google ScholarDigital Library
- Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). ACM, 297--306. Google ScholarDigital Library
- Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). ACM, 539--548. Google ScholarDigital Library
- Shuai Li, Baoxiang Wang, Shengyu Zhang, and Wei Chen. 2016. Contextual combinatorial cascading bandits. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML’16). JMLR.org, 1245--1253. Google ScholarDigital Library
- Andreas Lommatzsch and Sahin Albayrak. 2015. Real-time recommendations for user-item streams. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC’15). ACM, 1039--1046. Google ScholarDigital Library
- Jonathan Louëdec, Max Chevalier, Josiane Mothe, Aurélien Garivier, and Sébastien Gerchinovitz. 2015. A multiple-play bandit algorithm applied to recommender systems. In Proceedings of the 28th International Florida Artificial Intelligence Research Society Conference, FLAIRS. 67--72.Google Scholar
- Jérémie Mary, Romaric Gaudel, and Philippe Preux. 2014. Bandits warm-up cold recommender systems. CoRR abs/1407.2806 (June 2014).Google Scholar
- Saeed Masoudnia and Reza Ebrahimpour. 2014. Mixture of experts: A literature survey. The Artificial Intelligence Review 42, 2 (2014), 275--293. Google ScholarDigital Library
- Joseph Mellor and Jonathan Shapiro. 2013. Thompson Sampling in switching environments with Bayesian online change point detection. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics. 442--450.Google Scholar
- Nagarajan Natarajan, Donghyuk Shin, and Inderjit S. Dhillon. 2013. Which app will you use next?: Collaborative filtering with interactional context. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys’13). ACM, 201--208. Google ScholarDigital Library
- Minh-Quan Nguyen. 2014. Multi-Armed Bandit Problem and Its Applications in Intelligent Tutoring Systems. Master’s thesis. École Polytechnique.Google Scholar
- Dimitris Paraschakis, Bengt J. Nilsson, and John Holländer. 2015. Comparative evaluation of top-N recommenders in e-commerce: An industrial perspective. In Proceedings of the 14th International Conference on Machine Learning and Applications (ICMLA). IEEE, 1024-1031.Google ScholarCross Ref
- Massimo Quadrana. 2017. Algorithms for Sequence-Aware Recommender Systems. Ph.D. Dissertation. Politecnico di Milano.Google Scholar
- Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-aware recommender systems. ACM Comput. Surv. 51, 4, Article 66 (July 2018), 36 pages. Google ScholarDigital Library
- Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing session-based recommendations with hierarchical recurrent neural networks. In Proceedings of the 11th ACM Conference on Recommender Systems (RecSys’17). ACM, 130--137. Google ScholarDigital Library
- Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). ACM, 784--791. Google ScholarDigital Library
- Aleksandrs Slivkins. 2014. Contextual bandits with similarity information. J. Mach. Learn. Res. 15, 1 (Jan. 2014), 2533--2568. Google ScholarDigital Library
- Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. 2013. Ranked bandits in metric spaces: Learning diverse rankings over large document collections. J. Mach. Learn. Res. 14, 1 (Feb. 2013), 399--436. Google ScholarDigital Library
- Matthew Streeter and Daniel Golovin. 2008. An online algorithm for maximizing submodular functions. In Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS’08). Curran Associates, Inc., 1577--1584. Google ScholarDigital Library
- Liang Tang, Yexi Jiang, Lei Li, and Tao Li. 2014. Ensemble contextual bandits for personalized recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys’14). ACM, 73--80. Google ScholarDigital Library
- William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.Google ScholarCross Ref
- Trinh Xuan Tuan and Tu Minh Phuong. 2017. 3D convolutional networks for session-based recommendation with content features. In Proceedings of the 11th ACM Conference on Recommender Systems (RecSys’17). ACM, 138--146. Google ScholarDigital Library
- Bartlomiej Twardowski. 2016. Modelling contextual information in session-aware recommender systems with neural networks. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys’16). ACM, 273--276. Google ScholarDigital Library
- Taishi Uchiya, Atsuyoshi Nakamura, and Mineichi Kudo. 2010. Algorithms for adversarial bandit problems with multiple plays. In Proceedings of the 21st International Conference on Algorithmic Learning Theory (ALT’10). Springer-Verlag, 375--389. Google ScholarDigital Library
- João Vinagre, Alípio Mário Jorge, and João Gama. 2014. Evaluation of recommender systems in streaming environments. In Proceedings of the ACM RecSys Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD’14). ACM.Google Scholar
- Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: A reinforcement learning approach. ACM Trans. Multimedia Comput. Commun. Appl. 11, 1 (2014), 7:1--7:22. Google ScholarDigital Library
- Chen Wu, Ming Yan, and Luo Si. 2017. Session-aware information embedding for e-commerce product recommendation. CoRR abs/1707.05955 (2017). Google ScholarDigital Library
- Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management (CIKM’13). ACM, 1411--1420. Google ScholarDigital Library
Index Terms
- A Bandit-Based Ensemble Framework for Exploration/Exploitation of Diverse Recommendation Components: An Experimental Study within E-Commerce
Recommendations
Ensemble Recommendations via Thompson Sampling: an Experimental Study within e-Commerce
IUI '18: Proceedings of the 23rd International Conference on Intelligent User InterfacesThis work presents an extension of Thompson Sampling bandit policy for orchestrating the collection of base recommendation algorithms for e-commerce. We focus on the problem of item-to-item recommendations, for which multiple behavioral and attribute-...
A User Trust-Based Collaborative Filtering Recommendation Algorithm
Information and Communications SecurityAbstractDue to the open nature of collaborative recommender systems, they can not effectively prevent malicious users from injecting fake profile data into the ratings database, which can significantly bias the system’s output. With this problem in mind, ...
A simple multi-armed nearest-neighbor bandit for interactive recommendation
RecSys '19: Proceedings of the 13th ACM Conference on Recommender SystemsThe cyclic nature of the recommendation task is being increasingly taken into account in recommender systems research. In this line, framing interactive recommendation as a genuine reinforcement learning problem, multi-armed bandit approaches have been ...
Comments