ABSTRACT
Improvements in the recommender systems (RS) domain are not possible without a thorough way to evaluate and compare newly proposed approaches. User studies represent a viable alternative to online and offline evaluation schemes, but despite their numerous benefits, they are only rarely used. One of the main reasons behind this fact is that preparing a user study from scratch involves a lot of extra work on top of a simple algorithm proposal. To simplify this task, we propose EasyStudy, a modular framework built on the credo “Make simple things fast and hard things possible”. It features ready-to-use datasets, preference elicitation methods, incrementally tuned baseline algorithms, study flow plugins, and evaluation metrics. As a result, a simple study comparing several RS can be deployed with just a few clicks, while more complex study designs can still benefit from a range of reusable components, such as preference elicitation. Overall, EasyStudy dramatically decreases the gap between the laboriousness of offline evaluation vs. user studies and, therefore, may contribute towards the more reliable and insightful user-centric evaluation of next-generation RS. The project repository is available from https://bit.ly/easy-study-repo.
- Vito Walter Anelli, Alejandro Bellogín, Antonio Ferrara, Daniele Malitesta, Felice Antonio Merra, Claudio Pomo, Francesco Maria Donini, and Tommaso Di Noia. 2021. Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation. In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai (Eds.). ACM, 2405–2414. https://doi.org/10.1145/3404835.3463245Google ScholarDigital Library
- Joeran Beel and Stefan Langer. 2015. A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems. In Research and Advanced Technology for Digital Libraries, Sarantos Kapidakis, Cezary Mazurek, and Marcin Werla (Eds.). Springer International Publishing, Cham, 153–168.Google Scholar
- Patrik Dokoupil, Ladislav Peska, and Ludovico Boratto. 2023. EasyStudy: Framework for Easy Deployment of User Studies on Recommender Systems. In Seventeenth ACM Conference on Recommender Systems (Singapore, Singapore) (RecSys ’23). Association for Computing Machinery, New York, NY, USA, to appear. https://doi.org/10.1145/3604915.3610640Google ScholarDigital Library
- Patrik Dokoupil, Ladislav Peska, and Ludovico Boratto. 2023. Rows or Columns? Minimizing Presentation Bias When Comparing Multiple Recommender Systems. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 2354–2358. https://doi.org/10.1145/3539618.3592056Google ScholarDigital Library
- Michael D. Ekstrand. 2020. LensKit for Python: Next-Generation Software for Recommender Systems Experiments. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 2999–3006. https://doi.org/10.1145/3340531.3412778Google ScholarDigital Library
- Michael D. Ekstrand. 2020. LensKit for Python: Next-Generation Software for Recommender Systems Experiments. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 2999–3006. https://doi.org/10.1145/3340531.3412778Google ScholarDigital Library
- Guibing Guo, Jie Zhang, Zhu Sun, and Neil Yorke-Smith. 2015. Librec: A java library for recommender systems.. In UMAP workshops, Vol. 4. Citeseer, 38–45.Google Scholar
- Kilol Gupta, Mukund Yelahanka Raghuprasad, and Pankhuri Kumar. 2018. A Hybrid Variational Autoencoder for Collaborative Filtering. CoRR abs/1808.01006 (2018). arXiv:1808.01006http://arxiv.org/abs/1808.01006Google Scholar
- F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (dec 2015), 19 pages. https://doi.org/10.1145/2827872Google ScholarDigital Library
- Dietmar Jannach, Markus Zanker, Mouzhi Ge, and Marian Gröning. 2012. Recommender Systems in Computer Science and Information Systems – A Landscape of Research. In E-Commerce and Web Technologies, Christian Huemer and Pasquale Lops (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 76–87.Google Scholar
- Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 689–698. https://doi.org/10.1145/3178876.3186150Google ScholarDigital Library
- Marco Rossetti, Fabio Stella, and Markus Zanker. 2016. Contrasting Offline and Online Results When Evaluating Recommendation Algorithms. In Proceedings of the 10th ACM Conference on Recommender Systems (Boston, Massachusetts, USA) (RecSys ’16). Association for Computing Machinery, New York, NY, USA, 31–34. https://doi.org/10.1145/2959100.2959176Google ScholarDigital Library
- Harald Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 3251–3257. https://doi.org/10.1145/3308558.3313710Google ScholarDigital Library
- Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. 2020. Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison. In Proceedings of the 14th ACM Conference on Recommender Systems.Google ScholarDigital Library
- Longqi Yang, Yin Cui, Yuan Xuan, Chenyang Wang, Serge Belongie, and Deborah Estrin. 2018. Unbiased Offline Recommender Evaluation for Missing-Not-at-Random Implicit Feedback. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY, USA, 279–287. https://doi.org/10.1145/3240323.3240355Google ScholarDigital Library
- Zygmunt Zajac. 2017. Goodbooks-10k: a new dataset for book recommendations. http://fastml.com/goodbooks-10k. FastML (2017).Google Scholar
- Eva Zangerle and Christine Bauer. 2022. Evaluating Recommender Systems: Survey and Framework. ACM Comput. Surv. 55, 8, Article 170 (dec 2022), 38 pages. https://doi.org/10.1145/3556536Google ScholarDigital Library
Index Terms
- EasyStudy: Framework for Easy Deployment of User Studies on Recommender Systems
Recommendations
Acquiring User Information Needs for Recommender Systems
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based ...
User Personality and User Satisfaction with Recommender Systems
In this study, we show that individual users' preferences for the level of diversity, popularity, and serendipity in recommendation lists cannot be inferred from their ratings alone. We demonstrate that we can extract strong signals about individual ...
Investigating serendipity in recommender systems based on real user feedback
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingOver the past several years, research in recommender systems has emphasized the importance of serendipity, but there is still no consensus on the definition of this concept and whether serendipitous items should be recommended is still not a well-...
Comments