ABSTRACT
Interactive recommendation has been recognized as a Multi-Armed Bandit (MAB) problem. Items are arms to be pulled (i.e., recommended) and the user’s satisfaction is the reward to be maximized. Despite the advances, there is still a lack of consensus on the best practices to evaluate such solutions. Recently, two complementary frameworks were proposed to evaluate bandit solutions more accurately: iRec and OBP. The first one has a complete set of offline metrics and bandit models that allows us to perform an comparisons with several evaluation policies. The second one provides a huge set of bandit models to be evaluated through several counterfactual estimators. However, there is a room to be explored when joining these two frameworks. We propose and evaluate an integration between both, demonstrating the potential and richness of such combination.
- Marc Abeille and Alessandro Lazaric. 2017. Linear thompson sampling revisited. In Artificial Intelligence and Statistics. PMLR, 176–184. https://doi.org/10.48550/arXiv.1611.06534Google ScholarCross Ref
- Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, Nov (2002), 397–422. https://doi.org/10.1162/153244303321897663Google ScholarCross Ref
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256. https://doi.org/10.1023/A:1013689704352Google ScholarDigital Library
- Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems. 2249–2257.Google Scholar
- Jaya Kawale, Hung H Bui, Branislav Kveton, Long T Thanh, and Sanjay Chawla. 2015. Efficient thompson sampling for online matrix-factorization recommendation. Advances in Neural Information Processing Systems 28 (2015), 1297–1305. https://doi.org/10.5555/2969239.2969384Google ScholarDigital Library
- Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 539–548.Google ScholarDigital Library
- Yaxu Liu, Jui-Nan Yen, Bowen Yuan, Rundong Shi, Peng Yan, and Chih-Jen Lin. 2022. Practical Counterfactual Policy Learning for Top-K Recommendations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1141–1151.Google ScholarDigital Library
- Weishen Pan, Sen Cui, Hongyi Wen, Kun Chen, Changshui Zhang, and Fei Wang. 2021. Correcting the User Feedback-Loop Bias for Recommendation Systems. arXiv preprint arXiv:2109.06037 (2021).Google Scholar
- Yuta Saito, Shunsuke Aihara, Megumi Matsutani, and Yusuke Narita. 2020. Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation. arXiv preprint arXiv:2008.07146 (2020).Google Scholar
- Javier Sanz-Cruzado, Pablo Castells, and Esther López. 2019. A simple multi-armed nearest-neighbor bandit for interactive recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 358–362.Google ScholarDigital Library
- Sulthana Shams, Daron Anderson, and Douglas Leith. 2021. Cluster-Based Bandits: Fast Cold-Start for Recommender System New Users. (2021).Google Scholar
- Nicollas Silva, Thiago Silva, Heitor Werneck, Leonardo Rocha, and Adriano Pereira. 2023. User Cold-Start Problem in Multi-Armed Bandits: When the First Recommendations Guide the User’s Experience. ACM Trans. Recomm. Syst. 1, 1 (2023). https://doi.org/10.1145/3554819Google ScholarDigital Library
- Nícollas Silva, Heitor Werneck, Thiago Silva, Adriano C. M. Pereira, and Leonardo Rocha. 2021. A contextual approach to improve the user’s experience in interactive recommendation systems. In WebMedia ’21: Brazilian Symposium on Multimedia and the Web, Belo Horizonte, Minas Gerais, Brazil, November 5-12, 2021, Adriano César Machado Pereira and Leonardo Chaves Dutra da Rocha (Eds.). ACM, 89–96. https://doi.org/10.1145/3470482.3479621Google ScholarDigital Library
- Thiago Silva, Nícollas Silva, Carlos Mito, Adriano C. M. Pereira, and Leonardo Rocha. 2022. Interactive POI Recommendation: applying a Multi-Armed Bandit framework to characterise and create new models for this scenario. In WebMedia ’22: Brazilian Symposium on Multimedia and Web, Curitiba, Brazil, November 7 - 11, 2022, Thiago Henrique Silva, Leyza Baldo Dorini, Jussara M. Almeida, and Humberto Torres Marques-Neto (Eds.). ACM, 211–221. https://doi.org/10.1145/3539637.3557060Google ScholarDigital Library
- Thiago Silva, Nícollas Silva, Heitor Werneck, Carlos Mito, Adriano CM Pereira, and Leonardo Rocha. 2022. Irec: An interactive recommendation framework. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3165–3175.Google ScholarDigital Library
- Qing Wang, Chunqiu Zeng, Wubai Zhou, Tao Li, S Sitharama Iyengar, Larisa Shwartz, and Genady Ya Grabarnik. 2018. Online interactive collaborative filtering using multi-armed bandit with dependent arms. IEEE Transactions on Knowledge and Data Engineering 31, 8 (2018), 1569–1580.Google ScholarDigital Library
- Qingyun Wu, Naveen Iyer, and Hongning Wang. 2018. Learning contextual bandits in a non-stationary environment. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 495–504.Google ScholarDigital Library
- Yanming Yang, Xin Xia, David Lo, and John Grundy. 2022. A survey on deep learning for software engineering. ACM Computing Surveys (CSUR) 54, 10s (2022), 1–73.Google ScholarDigital Library
- Xiaoxue Zhao, Weinan Zhang, and Jun Wang. 2013. Interactive collaborative filtering. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 1411–1420.Google ScholarDigital Library
- Sijin Zhou, Xinyi Dai, Haokun Chen, Weinan Zhang, Kan Ren, Ruiming Tang, Xiuqiang He, and Yong Yu. 2020. Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 179–188.Google ScholarDigital Library
- Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy Xiangji Huang, and Dawei Yin. 2020. Neural Interactive Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 749–758.Google ScholarDigital Library
Index Terms
- A Complete Framework for Offline and Counterfactual Evaluations of Interactive Recommendation Systems
Recommendations
A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation
RepSys '13: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems EvaluationOffline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which ...
Revisiting offline evaluation for implicit-feedback recommender systems
RecSys '19: Proceedings of the 13th ACM Conference on Recommender SystemsRecommender systems are typically evaluated in an offline setting. A subset of the available user-item interactions is sampled to serve as test set, and some model trained on the remaining data points is then evaluated on its performance to predict ...
Bridging the Gap Between User-centric and Offline Evaluation of Personalized Recommendation Systems
UMAP '18: Adjunct Publication of the 26th Conference on User Modeling, Adaptation and PersonalizationIn this paper, we propose to evaluate recommender systems by conducting both offline and user-centric evaluations, while considering multiple quality aspects in realistic settings. This comprehensive evaluation would provide insight on how to improve ...
Comments