skip to main content
research-article

Common Pitfalls in Training and Evaluating Recommender Systems

Published: 01 September 2017 Publication History

Abstract

This paper formally presents four common pitfalls in training and evaluating recommendation algorithms for information systems. Specifically, we show that it could be problematic to separate the server logs into training and test data for model generation and model evaluation if the training and the test data are selected improperly. In addition, we show that click through rate { a common metric to measure and compare the performance of different recommendation algorithms -- may not be a good measurement of profitability { the income a recommendation module brings to a website. Moreover, we demonstrate that evaluating recommendation revenue may not be a straightforward task as it first looks. Unfortunately, these pitfalls appeared in many previous studies on recommender systems and information systems. We explicitly explain these problems and propose methods to address them. We conducted experiments to support our claims. Finally, we review previous papers and competitions that may suffer from these problems.

References

[1]
ACM RecSys Challenge 2017. http://2017. recsyschallenge.com/. Accessed: 2017-07-14.
[2]
Click-through rate prediction. https://www.kaggle. com/c/avazu-ctr-prediction. Accessed: 2017-07-14.
[3]
Display advertising challenge. https://www.kaggle.com/c/criteo-display-ad-challenge. Accessed: 2017-07-14.
[4]
Outbrain click prediction. https://www.kaggle.com/c/outbrain-click-prediction. Accessed: 2017-07-14.
[5]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the second ACM international conference on web search and data mining, pages 5--14. ACM, 2009.
[6]
D. Ben-Shimon, A. Tsikinovsky, M. Friedmann, B. Shapira, L. Rokach, and J. Hoerle. Recsys challenge 2015 and the yoochoose dataset. In Proceedings of the 9th ACM Conference on Recommender Systems, pages 357--358. ACM, 2015.
[7]
J. Bennett and S. Lanning. The netix prize. In Proceedings of KDD cup and workshop, volume 2007, page 35, 2007.
[8]
H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles. CollabSeer: a search engine for collaboration discovery. In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, pages 231--240. ACM, 2011.
[9]
H.-H. Chen, I. Ororbia, G. Alexander, and C. L. Giles. ExpertSeer: a Keyphrase Based Expert Recommender for Digital Libraries. arXiv preprint arXiv:1511.02058, 2015.
[10]
M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems (TOIS), 22(1):143--177, 2004.
[11]
D. Eck, P. Lamere, T. Bertin-Mahieux, and S. Green. Automatic generation of social tags for music recommendation. In Advances in neural information processing systems, pages 385--392, 2008.
[12]
Y. Goldberg and O. Levy. word2vec explained: Deriving mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722, 2014.
[13]
Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 130--137. ACM, 2010.
[14]
J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5--53, Jan. 2004.
[15]
Y. Juan, Y. Zhuang, W.-S. Chin, and C.-J. Lin. Field-aware factorization machines for ctr prediction. In Proceedings of the 10th ACM Conference on Recommender Systems, pages 43--50. ACM, 2016.
[16]
Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, 2009.
[17]
L. Li, S. Chen, J. Kleban, and A. Gupta. Counter-factual estimation and optimization of click metrics in search engines: A case study. In Proceedings of the 24th International Conference on World Wide Web, pages 929--934. ACM, 2015.
[18]
L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297--306. ACM, 2011.
[19]
G. Linden, B. Smith, and J. York. Amazon.com recom- mendations: Item-to-item collaborative filtering. IEEE Internet computing, 7(1):76--80, 2003.
[20]
I. MacKenzie. How retailers can keep up with consumers. http://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-up-with-consumers. Accessed: 2017-07-14.
[21]
M. J. Pazzani and D. Billsus. Content-based recommendation systems. In The adaptive web, pages 325--341. Springer, 2007.
[22]
S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST), 3(3):57, 2012.
[23]
P. Romov and E. Sokolov. Recsys challenge 2015: ensemble learning with categorical features. In Proceed- ings of the 2015 International ACM Recommender Systems Challenge, page 1. ACM, 2015.
[24]
R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning, pages 791--798. ACM, 2007.
[25]
H. Steck. Item popularity and recommendation accuracy. In Proceedings of the fifth ACM conference on Recommender systems, pages 125--132. ACM, 2011.
[26]
W. Xiao, X. Xu, K. Liang, J. Mao, and J. Wang. Job recommendation with hawkes process: an effective solution for recsys challenge 2016. In Proceedings of the Recommender Systems Challenge, page 11. ACM, 2016.
[27]
T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and Y.-C. Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10):4511--4515, 2010.

Cited By

View all
  • (2024)Our Model Achieves Excellent Performance on MovieLens: What Does It Mean?ACM Transactions on Information Systems10.1145/367516342:6(1-25)Online publication date: 18-Oct-2024
  • (2024)Explaining Neural News Recommendation with Attributions onto Reading HistoriesACM Transactions on Intelligent Systems and Technology10.1145/367323316:1(1-25)Online publication date: 18-Jun-2024
  • (2023)On Challenges of Evaluating Recommender Systems in an Offline SettingProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3609495(1284-1285)Online publication date: 14-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 19, Issue 1
June 2017
59 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/3137597
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2017
Published in SIGKDD Volume 19, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Our Model Achieves Excellent Performance on MovieLens: What Does It Mean?ACM Transactions on Information Systems10.1145/367516342:6(1-25)Online publication date: 18-Oct-2024
  • (2024)Explaining Neural News Recommendation with Attributions onto Reading HistoriesACM Transactions on Intelligent Systems and Technology10.1145/367323316:1(1-25)Online publication date: 18-Jun-2024
  • (2023)On Challenges of Evaluating Recommender Systems in an Offline SettingProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3609495(1284-1285)Online publication date: 14-Sep-2023
  • (2023)A Critical Study on Data Leakage in Recommender System Offline EvaluationACM Transactions on Information Systems10.1145/356993041:3(1-27)Online publication date: 7-Feb-2023
  • (2023)Take a Fresh Look at Recommender Systems from an Evaluation StandpointProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591931(2629-2638)Online publication date: 19-Jul-2023
  • (2023)Detecting Inaccurate Sensors on a Large-Scale Sensor Network Using Centralized and Localized Graph Neural NetworksIEEE Sensors Journal10.1109/JSEN.2023.328727023:15(16446-16455)Online publication date: 1-Aug-2023
  • (2022)Offline Evaluation for Reinforcement Learning-Based RecommendationACM SIGIR Forum10.1145/3582900.358290556:2(1-14)Online publication date: 1-Dec-2022
  • (2022)Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web LogsJournal of Data and Information Quality10.1145/349039214:2(1-17)Online publication date: 23-Mar-2022
  • (2022)MGRec: Multi-Graph Fusion for Recommendation2022 8th International Conference on Big Data Computing and Communications (BigCom)10.1109/BigCom57025.2022.00041(266-275)Online publication date: Aug-2022
  • (2022)Building effective recommender systems for touristsAI Magazine10.1002/aaai.1205743:2(209-224)Online publication date: 23-Jun-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media