Recommender Systems Evaluation

Bellogín, Alejandro; Said, Alan

doi:10.1007/978-1-4939-7131-2_110162

Alejandro Bellogín³ &
Alan Said⁴

659 Accesses
2 Citations
4 Altmetric

Synonyms

Evaluation; Methods; Metrics; Recommendation systems; Reproducibility

Glossary

AUC:: Area under the curve
CF:: Collaborative filtering
CTR:: Click-through rate
DCG:: Discounted cumulative gain
ILD:: Intra-list diversity
IR:: Information retrieval
MAE:: Mean absolute error
MAP:: Mean average precision
ML:: Machine learning
RMSE:: Root-mean-squared error
ROC:: Receiver operating characteristic
RS:: Recommender system

Definition

The evaluation of RSs has been, and still is, the object of active research in the field. Since the advent of the first RS, recommendation performance has been usually equated to the accuracy of rating prediction, that is, estimated ratings are compared against actual ratings, and differences between them are computed by means of the MAE and RMSE metrics. In terms of the effective utility of recommendations for users, there is, however, an increasing realization that the quality (precision) of a ranking of recommended items can be more important than the accuracy in...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 2,500.00; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abel F (2015) We know where you should work next summer: Job recommendations. In: Werthner et al (2015), p 230, https://doi.org/10.1145/2792838.2799496
Amatriain X, Basilico J (2012) Netflix recommendations: beyond the 5 stars (part 1) – the netflix tech blog. http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html. Retrieved July 27, 2016
Armstrong TG, Moffat A, Webber W, Zobel J (2009) Improvements that don’t add up: ad-hoc retrieval results since 1998. In: Cheung DW, Song I, Chu WW, Hu X, Lin JJ (eds) Proceedings of the 18th ACM conference on information and knowledge management, CIKM 2009, Hong Kong, China, November 2–6, 2009, ACM, pp 601–610, https://doi.org/10.1145/1645953.1646031
Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval – the concepts and technology behind search, Second edn. Pearson Education Ltd., Harlow, England. http://www.mir2ed.org/
Balabanovic M, Shoham Y (1997) Content-based, collaborative recommendation. Commun ACM 40(3):66–72. https://doi.org/10.1145/245108.245124
Article Google Scholar
Basu C, Hirsh H, Cohen WW (1998) Recommendation as classification: using social and content-based information in recommendation. In: Mostow J, Rich C (eds). AAAI/IAAI, AAAI Press/MIT Press, pp 714–720
Google Scholar
Beel J, Genzmehr M, Langer S, Nürnberger A, Gipp B (2013) A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In: Bellogín A, Castells P, Said A, Tikk D (eds) Proceedings of the international workshop on reproducibility and replication in recommender systems evaluation, RepSys 2013, Hong Kong, China, October 12, 2013, ACM, pp 7–14, https://doi.org/10.1145/2532508.2532511
Bellogín A (2012) Recommender system performance evaluation and prediction: An information retrieval perspective. PhD thesis, Universidad Autónoma de Madrid
Google Scholar
Bellogín A, de Vries AP (2013) Understanding similarity metrics in neighbour-based recommender systems. In: Kurland O, Metzler D, Lioma C, Larsen B, Ingwersen P (eds) International conference on the theory of information retrieval, ICTIR’13, Copenhagen, Denmark, September 29–October 02, 2013, ACM, p 13, https://doi.org/10.1145/2499178.2499186
Bellogín A, Cantador I, Castells P (2010) A study of heterogeneity in recommendations for a social music service. In: Proceedings of the 1st international workshop on information heterogeneity and fusion in recommender systems, ACM, New York, NY, USA, HetRec’10, pp 1–8, https://doi.org/10.1145/1869446.1869447
Bellogín A, Castells P, Cantador I (2011) Precision-oriented evaluation of recommender systems: an algorithmic comparison. In: Mobasher B, Burke RD, Jannach D, Adomavicius G (eds) Proceedings of the 2011 ACM conference on recommender systems, RecSys 2011, Chicago, IL, USA, October 23–27, 2011, ACM, pp 333–336, https://doi.org/10.1145/2043932.2043996
Bellogín A, Cantador I, Díez F, Castells P, Chavarriaga E (2013) An empirical comparison of social, collaborative filtering, and hybrid recommenders. ACM TIST 4(1):14. https://doi.org/10.1145/2414425.2414439
Article Google Scholar
Bellogín A, Said A, de Vries AP (2014) The magic barrier of recommender systems – no magic, just ratings. In: Dimitrova V, Kuflik T, Chin D, Ricci F, Dolog P, Houben G (eds) User modeling, adaptation, and personalization – 22nd international conference, UMAP 2014, Aalborg, Denmark, July 7–11, 2014. Proceedings, Springer, Lecture Notes in Computer Science, vol 8538, pp 25–36, https://doi.org/10.1007/978-3-319-08786-3_3
Google Scholar
Bennett J, Lanning S, Netflix N (2007) The netflix prize. In: In KDD Cup and Workshop in conjunction with KDD
Google Scholar
Berkovsky S, Freyne J, Coombe M (2012) Physical activity motivating games: be active and get your own reward. ACM Trans Comput-Hum Interact 19(4):32. https://doi.org/10.1145/2395131.2395139
Article Google Scholar
Bistaffa F, Filippo A, Chalkiadakis G, Ramchurn SD (2015) Recommending fair payments for large-scale social ridesharing. In: Werthner et al (2015), pp 139–146, https://doi.org/10.1145/2792838.2800177
Bollen DGFM, Knijnenburg BP, Willemsen MC, Graus MP (2010) Understanding choice overload in recommender systems. In: Amatriain et al (2010), pp 63–70, https://doi.org/10.1145/1864708.1864724
Breese JS, Heckerman D, Kadie CM (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Cooper GF, Moral S (eds) UAI’98: Proceedings of the fourteenth conference on uncertainty in artificial intelligence, University of Wisconsin Business School, Madison, July 24–26, 1998, Morgan Kaufmann, pp 43–52. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=231&proceeding_id=14
Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Sanderson et al (2004), pp 25–32, https://doi.org/10.1145/1008992.1009000
Burke RD (2004) Hybrid recommender systems with case-based components. In: Funk P, González-Calero PA (eds) Advances in Case-Based Reasoning, 7th European conference, ECCBR 2004, Madrid, Spain, August 30 – September 2, 2004, Proceedings, Springer, Lecture Notes in Computer Science, vol 3155, pp 91–105, https://doi.org/10.1007/978-3-540-28631-8-8
Campos PG, Díez F, Cantador I (2014) Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Model User-Adapt Interact 24(1–2):67–119. https://doi.org/10.1007/s11257-012-9136-x
Article Google Scholar
Castells P, Hurley NJ, Vargas S (2015) Novelty and diversity in recommender systems. In: Ricci et al (2015), pp 881–918, https://doi.org/10.1007/978-1-4899-7637-6_26
Celma Ò, Herrera P (2008) A new approach to evaluating novel recommendations. In: Pu P, Bridge DG, Mobasher B, Ricci F (eds) Proceedings of the 2008 ACM conference on recommender systems, RecSys 2008, Lausanne, October 23–25, 2008, ACM, pp 179–186, https://doi.org/10.1145/1454008.1454038
Cremonesi P, Koren Y, Turrin R (2010) Performance of recommender algorithms on top-n recommendation tasks. In: Amatriain et al (2010), pp 39–46, https://doi.org/10.1145/1864708.1864721
Cremonesi P, Garzotto F, Negro S, Papadopoulos AV, Turrin R (2011) Comparative evaluation of recommender system quality. In: Tan DS, Amershi S, Begole B, Kellogg WA, Tungare M (eds) Proceedings of the international conference on human factors in computing systems, CHI 2011, Extended Abstracts Volume, Vancouver, May 7–12, 2011, ACM, pp 1927–1932, https://doi.org/10.1145/1979742.1979896
de Souza Pereira Moreira G, de Souza GA, da Cunha AM (2015) Comparing offline and online recommender system evaluations on long-tail distributions. In: Castells P (ed) Poster proceedings of the 9th ACM conference on recommender systems, RecSys 2015, Vienna, September 16, 2015, CEUR-WS.org, CEUR Workshop Proceedings, vol 1441. http://ceur-ws.org/Vol-1441/recsys2015_poster4.pdf
Deshpande M, Karypis G (2004) Item-based top-N recommendation algorithms. ACM Trans Inf Syst 22(1):143–177. https://doi.org/10.1145/963770.963776
Article Google Scholar
Ekstrand MD, Ludwig M, Konstan JA, Riedl J (2011a) Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. In: RecSys, pp 133–140
Google Scholar
Ekstrand MD, Riedl J, Konstan JA (2011b) Collaborative filtering recommender systems. Foundations and Trends in Human-Computer Interaction 4(2):175–243. https://doi.org/10.1561/1100000009
Article Google Scholar
Elahi M, Ge M, Ricci F, Massimo D, Berkovsky S (2014) Interactive food recommendation for groups. In: Chen L, Mahmud J (eds) Poster proceedings of the 8th ACM conference on recommender systems, RecSys 2014, Foster City, October 6–10, 2014, CEUR-WS.org, CEUR Workshop Proceedings, vol 1247. http://ceur-ws.org/Vol-1247/recsys14_poster2.pdf
Elahi M, Ge M, Ricci F, Fernández-Tobías I, Berkovsky S, Massimo D (2015) Interaction design in a mobile food recommender system. In: O’Donovan J, Felfernig A, Tintarev N, Brusilovsky P, Semeraro G, Lops P (eds) Proceedings of the joint workshop on interfaces and human decision making for recommender systems, IntRS 2015, co-located with ACM conference on recommender systems (RecSys 2015), Vienna, September 19, 2015, CEUR-WS.org, CEUR Workshop Proceedings, vol 1438, pp 49–52. http://ceur-ws.org/Vol-1438/paper9.pdf
Elsweiler D, Harvey M, Ludwig B, Said A (2015) Bringing the “healthy” into food recommenders. In: Ge M, Ricci F (eds) Proceedings of the 2nd international workshop on decision making and recommender systems, Bolzano, October 22–23, 2015, CEUR-WS.org, CEUR Workshop Proceedings, vol 1533, pp 33–36. http://ceur-ws.org/Vol-1533/paper8.pdf
Filippone M, Sanguinetti G (2010) Information theoretic novelty detection. Pattern Recogn 43(3):805–814. https://doi.org/10.1016/j.patcog.2009.07.002
Article MATH Google Scholar
Gantner Z, Rendle S, Freudenthaler C, Schmidt-Thieme L (2011) Mymedialite: A free recommender system library. In: RecSys, https://doi.org/10.1145/2043932.2043989
Garcin F, Faltings B, Donatsch O, Alazzawi A, Bruttin C, Huber A (2014) Offline and online evaluation of news recommender systems at swissinfo.ch. In: Kobsa et al (2014), pp 169–176, https://doi.org/10.1145/2645710.2645745
Ge M, Delgado-Battenfeld C, Jannach D (2010) Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Amatriain et al (2010), pp 257–260, https://doi.org/10.1145/1864708.1864761
Goldberg KY, Roeder T, Gupta D, Perkins C (2001) Eigentaste: a constant time collaborative filtering algorithm. Inf Retr 4(2):133–151. https://doi.org/10.1023/A:1011419012209
Article MATH Google Scholar
Gunawardana A, Shani G (2015) Evaluating recommender systems. In: Ricci et al (2015), pp 265–308, https://doi.org/10.1007/978-1-4899-7637-6_8
Guy I (2015) Social recommender systems. In: Ricci et al (2015), pp 511–543, https://doi.org/10.1007/978-1-4899-7637-6_15
Herlocker JL, Konstan JA, Terveen LG, Riedl J (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53. https://doi.org/10.1145/963770.963772
Article Google Scholar
Jambor T, Wang J (2010) Optimizing multiple objectives in collaborative filtering. In: RecSys, ACM, New York, pp 55–62, https://doi.org/10.1145/1864708.1864723
Jannach D, Lerche L, Jugovac M (2015) Adaptation and evaluation of recommendations for short-term shopping goals. In: Werthner et al (2015), pp 211–218, https://doi.org/10.1145/2792838.2800176
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446. https://doi.org/10.1145/582415.582418
Article Google Scholar
Kohavi R, Longbotham R, Sommerfield D, Henne RM (2009) Controlled experiments on the web: survey and practical guide. Data Min Knowl Discov 18(1):140–181. https://doi.org/10.1007/s10618-008-0114-1
Article MathSciNet Google Scholar
Luo L, Li B, Berkovsky S, Koprinska I, Chen F (2016) Who will be affected by supermarket health programs? tracking customer behavior changes via preference modeling. In: Bailey J, Khan L, Washio T, Dobbie G, Huang JZ, Wang R (eds) Advances in knowledge discovery and data mining – 20th Pacific-Asia conference, PAKDD 2016, Auckland, April 19–22, 2016, Proceedings, Part I, Springer, Lecture Notes in Computer Science, vol 9651, pp 527–539, https://doi.org/10.1007/978-3-319-31753-3_42
Chapter Google Scholar
Marlin BM (2003) Modeling user rating profiles for collaborative filtering. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems 16 [neural information processing systems, NIPS 2003, December 8–13, 2003, Vancouver and Whistler, BC, Canada], MIT Press, pp 627–634. http://papers.nips.cc/paper/2377-modeling-user-rating-profiles-for-collaborative-filtering
Massa P, Avesani P (2007) Trust-aware recommender systems. In: Konstan JA, Riedl J, Smyth B (eds) Proceedings of the 2007 ACM conference on recommender systems, RecSys 2007, Minneapolis, October 19–20, 2007, ACM, pp 17–24, https://doi.org/10.1145/1297231.1297235
McLaughlin MR, Herlocker JL (2004) A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In: Sanderson et al (2004), pp 329–336, https://doi.org/10.1145/1008992.1009050
McNee SM, Riedl J, Konstan JA (2006) Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: Olson GM, Jeffries R (eds) Extended abstracts proceedings of the 2006 conference on human factors in computing systems, CHI 2006, Montréal, April 22–27, 2006, ACM, pp 1097–1101, https://doi.org/10.1145/1125451.1125659
Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in Action. Manning Publications Co., Greenwich, CT, USA. Rennie JDM, Srebro N (2005) Fast maximum margin matrix factorization for collaborative prediction. In: Raedt LD, Wrobel S (eds) Machine learning, proceedings of the twenty-second international conference (ICML 2005), Bonn, August 7–11, 2005, ACM, ACM international conference proceeding series, vol 119, pp 713–719, https://doi.org/10.1145/1102351.1102441
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: An open architecture for collaborative filtering of netnews. In: Smith JB, Smith FD, Malone TW (eds) CSCW’94, Proceedings of the conference on computer supported cooperative work, Chapel Hill, NC, USA, October 22–26, 1994, ACM, pp 175–186, https://doi.org/10.1145/192844.192905
Ribeiro MT, Lacerda A, Veloso A, Ziviani N (2012) Pareto-efficient hybridization for multi-objective recommender systems. In: Cunningham P, Hurley NJ, Guy Ι, Anand SS (eds) Sixth ACM conference on recommender systems, RecSys’12, Dublin, September 9–13, 2012, ACM, pp 19–26, https://doi.org/10.1145/2365952.2365962
Ricci F, Rokach L, Shapira B (eds) (2015) Recommender Systems Handbook. Springer, New York. https://doi.org/10.1007/978-1-4899-7637-6
Book MATH Google Scholar
Said A (2013) Evaluating the accuracy and utility of recommender systems. PhD thesis, Technische Universität Berlin
Google Scholar
Said A, Bellogín A (2014) Comparative recommender system evaluation: benchmarking recommendation frameworks. In: Kobsa et al (2014), pp 129–136, https://doi.org/10.1145/2645710.2645746
Said A, Jain BJ, Narr S, Plumbaum T (2012) Users and noise: The magic barrier of recommender systems. In: Masthoff J, Mobasher B, Desmarais MC, Nkambou R (eds) User modeling, adaptation, and personalization – 20th international conference, UMAP 2012, Montreal, July 16–20, 2012. Proceedings, Springer, Lecture Notes in Computer Science, vol 7379, pp 237–248, https://doi.org/10.1007/978-3-642-31454-4_20
Chapter Google Scholar
Said A, Fields B, Jain BJ, Albayrak S (2013a) User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Bruckman A, Counts S, Lampe C, Terveen LG (eds) Computer supported cooperative work, CSCW 2013, San Antonio, February 23–27, 2013, ACM, pp 1399–1408, https://doi.org/10.1145/2441776.2441933
Said A, Jain BJ, Albayrak S (2013b) A 3d approach to recommender system evaluation. In: Bruckman A, Counts S, Lampe C, Terveen LG (eds) Computer supported cooperative work, CSCW 2013, San Antonio, February 23–27, 2013, Companion Volume, ACM, pp 263–266, https://doi.org/10.1145/2441955.2442017
Said A, Bellogín A, Lin JJ, de Vries AP (2014a) Do recommendations matter?: news recommendation in real life. In: Fussell SR, Lutters WG, Morris MR, Reddy M (eds) Computer supported cooperative work, CSCW’14, Baltimore, February 15–19, 2014, Companion Volume, ACM, pp 237–240, https://doi.org/10.1145/2556420.2556510
Said A, Tikk D, Cremonesi P (2014b) Benchmarking – a methodology for ensuring the relative quality of recommendation systems in software engineering. In: Robillard MP, Maalej W, Walker RJ, Zimmermann T (eds) Recommendation Systems in Software Engineering. Springer, Berlin, pp 275–300. https://doi.org/10.1007/978-3-642-45135-5-11
Chapter Google Scholar
Smyth B, McClave P (2001) Similarity vs. diversity. In: Aha DW, Watson ID (eds) Case-Based Reasoning Research and Development, 4th international conference on case-based reasoning, ICCBR 2001, Vancouver, July 30 – August 2, 2001, Proceedings, Springer, Lecture Notes in Computer Science, vol 2080, pp 347–361, https://doi.org/10.1007/3-540-44593-5_25
Chapter Google Scholar
Swearingen K, Sinha R (2001) Beyond algorithms: An HCI perspective on recommender systems. In: ACM SIGIR. Workshop on recommender systems, vol 13, no 5–6, pp 393–408
Google Scholar
Tkalcic M, Quercia D, Graf S (2016) Preface to the special issue on personality in personalized systems. User Model User-Adapt Interact 26(2–3):103–107. https://doi.org/10.1007/s11257-016-9175-9
Article Google Scholar
Tomlinson S (2012) Measuring robustness with first relevant score in the TREC 2012 microblog track. In: Voorhees EM, Buckland LP (eds) Proceedings of the twenty-first text retrieval conference, TREC 2012, Gaithersburg, November 6–9, 2012, National Institute of Standards and Technology (NIST), vol Special Publication 500–298. http://trec.nist.gov/pubs/trec21/papers/OpenText.microblog.final.pdf
Vargas S (2015) Novelty and diversity evaluation and enhancement in recommender systems. PhD thesis, Universidad Autónoma de Madrid
Google Scholar
Vargas S, Castells P (2013) Exploiting the diversity of user preferences for recommendation. In: Ferreira J, Magalhães J, Calado P (eds) Open research areas in information retrieval, OAIR’13, Lisbon, May 15–17, 2013, ACM, pp 129–136. http://dl.acm.org/citation.cfm?id=2491776
Vargas S, Castells P (2014) Improving sales diversity by recommending users to items. In: Kobsa et al (2014), pp 145–152, https://doi.org/10.1145/2645710.2645744
Vargas S, Baltrunas L, Karatzoglou A, Castells P (2014) Coverage, redundancy and size-awareness in genre diversity for recommender systems. In: Kobsa et al (2014), pp 209–216, https://doi.org/10.1145/2645710.2645743
Yao YY (1995) Measuring retrieval effectiveness based on user preference of documents. JASIS 46(2):133–145. https://doi.org/10.1002/(SICI)1097-4571(199503)46:2<133::AID-ASI6>3.0.CO;2-Z
Article Google Scholar
Zhao X, Niu Z, Chen W (2013) Opinion-based collaborative filtering to solve popularity bias in recommender systems. In: Decker H, Lhotská L, Link S, Basl J, Tjoa AM (eds) Database and expert systems applications – 24th international conference, DEXA 2013, Prague, August 26–29, 2013. Proceedings, Part II, Springer, Lecture Notes in Computer Science, vol 8056, pp 426–433, https://doi.org/10.1007/978-3-642-40173-2_35
Google Scholar
Zhao X, Zhang W, Wang J (2015) Risk-hedged venture capital investment recommendation. In: Werthner et al (2015), pp 75–82, https://doi.org/10.1145/2792838.2800181
Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, Zhang YC (2010) Solving the apparent diversity-accuracy dilemma of recommender systems. Proc Natl Acad Sci 107(10):4511–4515. https://doi.org/10.1073/pnas.1000488107
Article Google Scholar
Ziegler C, Lausen G (2009) Making product recommendations more diverse. IEEE Data Eng Bull 32(4):23–32. http://sites.computer.org/debull/A09dec/ziegler-paper1.pdf
Ziegler C, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification. In: Ellis A, Hagino T (eds) Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10–14, 2005, ACM, pp 22–32, https://doi.org/10.1145/1060745.1060754

Download references

Author information

Authors and Affiliations

Universidad Autónoma de Madrid, Ciudad Universitaria de Cantoblanco, 28049, Madrid, Spain
Alejandro Bellogín
School of Informatics, University of Skövde, Högskolevägen, Box 408, 541 28, Skövde, Sweden
Alan Said

Authors

Alejandro Bellogín
View author publications
You can also search for this author in PubMed Google Scholar
Alan Said
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alejandro Bellogín .

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary, Calgary, AB, Canada
Reda Alhajj
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Jon Rokne

Section Editor information

Department of Computer Science, University of Bari "Aldo Moro", Bari, Italy
Giovanni Semeraro
Bari, Italy
Cataldo Musto Ph.D

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Bellogín, A., Said, A. (2018). Recommender Systems Evaluation. In: Alhajj, R., Rokne, J. (eds) Encyclopedia of Social Network Analysis and Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-7131-2_110162

Download citation

DOI: https://doi.org/10.1007/978-1-4939-7131-2_110162
Published: 12 June 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-7130-5
Online ISBN: 978-1-4939-7131-2
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics