Skip to main content

Recommender Systems Evaluation

  • Reference work entry
  • First Online:
Encyclopedia of Social Network Analysis and Mining

Synonyms

Evaluation; Methods; Metrics; Recommendation systems; Reproducibility

Glossary

AUC:

Area under the curve

CF:

Collaborative filtering

CTR:

Click-through rate

DCG:

Discounted cumulative gain

ILD:

Intra-list diversity

IR:

Information retrieval

MAE:

Mean absolute error

MAP:

Mean average precision

ML:

Machine learning

RMSE:

Root-mean-squared error

ROC:

Receiver operating characteristic

RS:

Recommender system

Definition

The evaluation of RSs has been, and still is, the object of active research in the field. Since the advent of the first RS, recommendation performance has been usually equated to the accuracy of rating prediction, that is, estimated ratings are compared against actual ratings, and differences between them are computed by means of the MAE and RMSE metrics. In terms of the effective utility of recommendations for users, there is, however, an increasing realization that the quality (precision) of a ranking of recommended items can be more important than the accuracy in...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 2,500.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abel F (2015) We know where you should work next summer: Job recommendations. In: Werthner et al (2015), p 230, https://doi.org/10.1145/2792838.2799496

  • Amatriain X, Basilico J (2012) Netflix recommendations: beyond the 5 stars (part 1) – the netflix tech blog. http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html. Retrieved July 27, 2016

  • Armstrong TG, Moffat A, Webber W, Zobel J (2009) Improvements that don’t add up: ad-hoc retrieval results since 1998. In: Cheung DW, Song I, Chu WW, Hu X, Lin JJ (eds) Proceedings of the 18th ACM conference on information and knowledge management, CIKM 2009, Hong Kong, China, November 2–6, 2009, ACM, pp 601–610, https://doi.org/10.1145/1645953.1646031

  • Baeza-Yates RA, Ribeiro-Neto BA (2011) Modern information retrieval – the concepts and technology behind search, Second edn. Pearson Education Ltd., Harlow, England. http://www.mir2ed.org/

  • Balabanovic M, Shoham Y (1997) Content-based, collaborative recommendation. Commun ACM 40(3):66–72. https://doi.org/10.1145/245108.245124

    Article  Google Scholar 

  • Basu C, Hirsh H, Cohen WW (1998) Recommendation as classification: using social and content-based information in recommendation. In: Mostow J, Rich C (eds). AAAI/IAAI, AAAI Press/MIT Press, pp 714–720

    Google Scholar 

  • Beel J, Genzmehr M, Langer S, Nürnberger A, Gipp B (2013) A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In: Bellogín A, Castells P, Said A, Tikk D (eds) Proceedings of the international workshop on reproducibility and replication in recommender systems evaluation, RepSys 2013, Hong Kong, China, October 12, 2013, ACM, pp 7–14, https://doi.org/10.1145/2532508.2532511

  • Bellogín A (2012) Recommender system performance evaluation and prediction: An information retrieval perspective. PhD thesis, Universidad Autónoma de Madrid

    Google Scholar 

  • Bellogín A, de Vries AP (2013) Understanding similarity metrics in neighbour-based recommender systems. In: Kurland O, Metzler D, Lioma C, Larsen B, Ingwersen P (eds) International conference on the theory of information retrieval, ICTIR’13, Copenhagen, Denmark, September 29–October 02, 2013, ACM, p 13, https://doi.org/10.1145/2499178.2499186

  • Bellogín A, Cantador I, Castells P (2010) A study of heterogeneity in recommendations for a social music service. In: Proceedings of the 1st international workshop on information heterogeneity and fusion in recommender systems, ACM, New York, NY, USA, HetRec’10, pp 1–8, https://doi.org/10.1145/1869446.1869447

  • Bellogín A, Castells P, Cantador I (2011) Precision-oriented evaluation of recommender systems: an algorithmic comparison. In: Mobasher B, Burke RD, Jannach D, Adomavicius G (eds) Proceedings of the 2011 ACM conference on recommender systems, RecSys 2011, Chicago, IL, USA, October 23–27, 2011, ACM, pp 333–336, https://doi.org/10.1145/2043932.2043996

  • Bellogín A, Cantador I, Díez F, Castells P, Chavarriaga E (2013) An empirical comparison of social, collaborative filtering, and hybrid recommenders. ACM TIST 4(1):14. https://doi.org/10.1145/2414425.2414439

    Article  Google Scholar 

  • Bellogín A, Said A, de Vries AP (2014) The magic barrier of recommender systems – no magic, just ratings. In: Dimitrova V, Kuflik T, Chin D, Ricci F, Dolog P, Houben G (eds) User modeling, adaptation, and personalization – 22nd international conference, UMAP 2014, Aalborg, Denmark, July 7–11, 2014. Proceedings, Springer, Lecture Notes in Computer Science, vol 8538, pp 25–36, https://doi.org/10.1007/978-3-319-08786-3_3

    Google Scholar 

  • Bennett J, Lanning S, Netflix N (2007) The netflix prize. In: In KDD Cup and Workshop in conjunction with KDD

    Google Scholar 

  • Berkovsky S, Freyne J, Coombe M (2012) Physical activity motivating games: be active and get your own reward. ACM Trans Comput-Hum Interact 19(4):32. https://doi.org/10.1145/2395131.2395139

    Article  Google Scholar 

  • Bistaffa F, Filippo A, Chalkiadakis G, Ramchurn SD (2015) Recommending fair payments for large-scale social ridesharing. In: Werthner et al (2015), pp 139–146, https://doi.org/10.1145/2792838.2800177

  • Bollen DGFM, Knijnenburg BP, Willemsen MC, Graus MP (2010) Understanding choice overload in recommender systems. In: Amatriain et al (2010), pp 63–70, https://doi.org/10.1145/1864708.1864724

  • Breese JS, Heckerman D, Kadie CM (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Cooper GF, Moral S (eds) UAI’98: Proceedings of the fourteenth conference on uncertainty in artificial intelligence, University of Wisconsin Business School, Madison, July 24–26, 1998, Morgan Kaufmann, pp 43–52. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=231&proceeding_id=14

  • Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Sanderson et al (2004), pp 25–32, https://doi.org/10.1145/1008992.1009000

  • Burke RD (2004) Hybrid recommender systems with case-based components. In: Funk P, González-Calero PA (eds) Advances in Case-Based Reasoning, 7th European conference, ECCBR 2004, Madrid, Spain, August 30 – September 2, 2004, Proceedings, Springer, Lecture Notes in Computer Science, vol 3155, pp 91–105, https://doi.org/10.1007/978-3-540-28631-8-8

  • Campos PG, Díez F, Cantador I (2014) Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Model User-Adapt Interact 24(1–2):67–119. https://doi.org/10.1007/s11257-012-9136-x

    Article  Google Scholar 

  • Castells P, Hurley NJ, Vargas S (2015) Novelty and diversity in recommender systems. In: Ricci et al (2015), pp 881–918, https://doi.org/10.1007/978-1-4899-7637-6_26

  • Celma Ò, Herrera P (2008) A new approach to evaluating novel recommendations. In: Pu P, Bridge DG, Mobasher B, Ricci F (eds) Proceedings of the 2008 ACM conference on recommender systems, RecSys 2008, Lausanne, October 23–25, 2008, ACM, pp 179–186, https://doi.org/10.1145/1454008.1454038

  • Cremonesi P, Koren Y, Turrin R (2010) Performance of recommender algorithms on top-n recommendation tasks. In: Amatriain et al (2010), pp 39–46, https://doi.org/10.1145/1864708.1864721

  • Cremonesi P, Garzotto F, Negro S, Papadopoulos AV, Turrin R (2011) Comparative evaluation of recommender system quality. In: Tan DS, Amershi S, Begole B, Kellogg WA, Tungare M (eds) Proceedings of the international conference on human factors in computing systems, CHI 2011, Extended Abstracts Volume, Vancouver, May 7–12, 2011, ACM, pp 1927–1932, https://doi.org/10.1145/1979742.1979896

  • de Souza Pereira Moreira G, de Souza GA, da Cunha AM (2015) Comparing offline and online recommender system evaluations on long-tail distributions. In: Castells P (ed) Poster proceedings of the 9th ACM conference on recommender systems, RecSys 2015, Vienna, September 16, 2015, CEUR-WS.org, CEUR Workshop Proceedings, vol 1441. http://ceur-ws.org/Vol-1441/recsys2015_poster4.pdf

  • Deshpande M, Karypis G (2004) Item-based top-N recommendation algorithms. ACM Trans Inf Syst 22(1):143–177. https://doi.org/10.1145/963770.963776

    Article  Google Scholar 

  • Ekstrand MD, Ludwig M, Konstan JA, Riedl J (2011a) Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. In: RecSys, pp 133–140

    Google Scholar 

  • Ekstrand MD, Riedl J, Konstan JA (2011b) Collaborative filtering recommender systems. Foundations and Trends in Human-Computer Interaction 4(2):175–243. https://doi.org/10.1561/1100000009

    Article  Google Scholar 

  • Elahi M, Ge M, Ricci F, Massimo D, Berkovsky S (2014) Interactive food recommendation for groups. In: Chen L, Mahmud J (eds) Poster proceedings of the 8th ACM conference on recommender systems, RecSys 2014, Foster City, October 6–10, 2014, CEUR-WS.org, CEUR Workshop Proceedings, vol 1247. http://ceur-ws.org/Vol-1247/recsys14_poster2.pdf

  • Elahi M, Ge M, Ricci F, Fernández-Tobías I, Berkovsky S, Massimo D (2015) Interaction design in a mobile food recommender system. In: O’Donovan J, Felfernig A, Tintarev N, Brusilovsky P, Semeraro G, Lops P (eds) Proceedings of the joint workshop on interfaces and human decision making for recommender systems, IntRS 2015, co-located with ACM conference on recommender systems (RecSys 2015), Vienna, September 19, 2015, CEUR-WS.org, CEUR Workshop Proceedings, vol 1438, pp 49–52. http://ceur-ws.org/Vol-1438/paper9.pdf

  • Elsweiler D, Harvey M, Ludwig B, Said A (2015) Bringing the “healthy” into food recommenders. In: Ge M, Ricci F (eds) Proceedings of the 2nd international workshop on decision making and recommender systems, Bolzano, October 22–23, 2015, CEUR-WS.org, CEUR Workshop Proceedings, vol 1533, pp 33–36. http://ceur-ws.org/Vol-1533/paper8.pdf

  • Filippone M, Sanguinetti G (2010) Information theoretic novelty detection. Pattern Recogn 43(3):805–814. https://doi.org/10.1016/j.patcog.2009.07.002

    Article  MATH  Google Scholar 

  • Gantner Z, Rendle S, Freudenthaler C, Schmidt-Thieme L (2011) Mymedialite: A free recommender system library. In: RecSys, https://doi.org/10.1145/2043932.2043989

  • Garcin F, Faltings B, Donatsch O, Alazzawi A, Bruttin C, Huber A (2014) Offline and online evaluation of news recommender systems at swissinfo.ch. In: Kobsa et al (2014), pp 169–176, https://doi.org/10.1145/2645710.2645745

  • Ge M, Delgado-Battenfeld C, Jannach D (2010) Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Amatriain et al (2010), pp 257–260, https://doi.org/10.1145/1864708.1864761

  • Goldberg KY, Roeder T, Gupta D, Perkins C (2001) Eigentaste: a constant time collaborative filtering algorithm. Inf Retr 4(2):133–151. https://doi.org/10.1023/A:1011419012209

    Article  MATH  Google Scholar 

  • Gunawardana A, Shani G (2015) Evaluating recommender systems. In: Ricci et al (2015), pp 265–308, https://doi.org/10.1007/978-1-4899-7637-6_8

  • Guy I (2015) Social recommender systems. In: Ricci et al (2015), pp 511–543, https://doi.org/10.1007/978-1-4899-7637-6_15

  • Herlocker JL, Konstan JA, Terveen LG, Riedl J (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53. https://doi.org/10.1145/963770.963772

    Article  Google Scholar 

  • Jambor T, Wang J (2010) Optimizing multiple objectives in collaborative filtering. In: RecSys, ACM, New York, pp 55–62, https://doi.org/10.1145/1864708.1864723

  • Jannach D, Lerche L, Jugovac M (2015) Adaptation and evaluation of recommendations for short-term shopping goals. In: Werthner et al (2015), pp 211–218, https://doi.org/10.1145/2792838.2800176

  • Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446. https://doi.org/10.1145/582415.582418

    Article  Google Scholar 

  • Kohavi R, Longbotham R, Sommerfield D, Henne RM (2009) Controlled experiments on the web: survey and practical guide. Data Min Knowl Discov 18(1):140–181. https://doi.org/10.1007/s10618-008-0114-1

    Article  MathSciNet  Google Scholar 

  • Luo L, Li B, Berkovsky S, Koprinska I, Chen F (2016) Who will be affected by supermarket health programs? tracking customer behavior changes via preference modeling. In: Bailey J, Khan L, Washio T, Dobbie G, Huang JZ, Wang R (eds) Advances in knowledge discovery and data mining – 20th Pacific-Asia conference, PAKDD 2016, Auckland, April 19–22, 2016, Proceedings, Part I, Springer, Lecture Notes in Computer Science, vol 9651, pp 527–539, https://doi.org/10.1007/978-3-319-31753-3_42

    Chapter  Google Scholar 

  • Marlin BM (2003) Modeling user rating profiles for collaborative filtering. In: Thrun S, Saul LK, Schölkopf B (eds) Advances in neural information processing systems 16 [neural information processing systems, NIPS 2003, December 8–13, 2003, Vancouver and Whistler, BC, Canada], MIT Press, pp 627–634. http://papers.nips.cc/paper/2377-modeling-user-rating-profiles-for-collaborative-filtering

  • Massa P, Avesani P (2007) Trust-aware recommender systems. In: Konstan JA, Riedl J, Smyth B (eds) Proceedings of the 2007 ACM conference on recommender systems, RecSys 2007, Minneapolis, October 19–20, 2007, ACM, pp 17–24, https://doi.org/10.1145/1297231.1297235

  • McLaughlin MR, Herlocker JL (2004) A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In: Sanderson et al (2004), pp 329–336, https://doi.org/10.1145/1008992.1009050

  • McNee SM, Riedl J, Konstan JA (2006) Being accurate is not enough: how accuracy metrics have hurt recommender systems. In: Olson GM, Jeffries R (eds) Extended abstracts proceedings of the 2006 conference on human factors in computing systems, CHI 2006, Montréal, April 22–27, 2006, ACM, pp 1097–1101, https://doi.org/10.1145/1125451.1125659

  • Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in Action. Manning Publications Co., Greenwich, CT, USA. Rennie JDM, Srebro N (2005) Fast maximum margin matrix factorization for collaborative prediction. In: Raedt LD, Wrobel S (eds) Machine learning, proceedings of the twenty-second international conference (ICML 2005), Bonn, August 7–11, 2005, ACM, ACM international conference proceeding series, vol 119, pp 713–719, https://doi.org/10.1145/1102351.1102441

  • Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: An open architecture for collaborative filtering of netnews. In: Smith JB, Smith FD, Malone TW (eds) CSCW’94, Proceedings of the conference on computer supported cooperative work, Chapel Hill, NC, USA, October 22–26, 1994, ACM, pp 175–186, https://doi.org/10.1145/192844.192905

  • Ribeiro MT, Lacerda A, Veloso A, Ziviani N (2012) Pareto-efficient hybridization for multi-objective recommender systems. In: Cunningham P, Hurley NJ, Guy Ι, Anand SS (eds) Sixth ACM conference on recommender systems, RecSys’12, Dublin, September 9–13, 2012, ACM, pp 19–26, https://doi.org/10.1145/2365952.2365962

  • Ricci F, Rokach L, Shapira B (eds) (2015) Recommender Systems Handbook. Springer, New York. https://doi.org/10.1007/978-1-4899-7637-6

    Book  MATH  Google Scholar 

  • Said A (2013) Evaluating the accuracy and utility of recommender systems. PhD thesis, Technische Universität Berlin

    Google Scholar 

  • Said A, Bellogín A (2014) Comparative recommender system evaluation: benchmarking recommendation frameworks. In: Kobsa et al (2014), pp 129–136, https://doi.org/10.1145/2645710.2645746

  • Said A, Jain BJ, Narr S, Plumbaum T (2012) Users and noise: The magic barrier of recommender systems. In: Masthoff J, Mobasher B, Desmarais MC, Nkambou R (eds) User modeling, adaptation, and personalization – 20th international conference, UMAP 2012, Montreal, July 16–20, 2012. Proceedings, Springer, Lecture Notes in Computer Science, vol 7379, pp 237–248, https://doi.org/10.1007/978-3-642-31454-4_20

    Chapter  Google Scholar 

  • Said A, Fields B, Jain BJ, Albayrak S (2013a) User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Bruckman A, Counts S, Lampe C, Terveen LG (eds) Computer supported cooperative work, CSCW 2013, San Antonio, February 23–27, 2013, ACM, pp 1399–1408, https://doi.org/10.1145/2441776.2441933

  • Said A, Jain BJ, Albayrak S (2013b) A 3d approach to recommender system evaluation. In: Bruckman A, Counts S, Lampe C, Terveen LG (eds) Computer supported cooperative work, CSCW 2013, San Antonio, February 23–27, 2013, Companion Volume, ACM, pp 263–266, https://doi.org/10.1145/2441955.2442017

  • Said A, Bellogín A, Lin JJ, de Vries AP (2014a) Do recommendations matter?: news recommendation in real life. In: Fussell SR, Lutters WG, Morris MR, Reddy M (eds) Computer supported cooperative work, CSCW’14, Baltimore, February 15–19, 2014, Companion Volume, ACM, pp 237–240, https://doi.org/10.1145/2556420.2556510

  • Said A, Tikk D, Cremonesi P (2014b) Benchmarking – a methodology for ensuring the relative quality of recommendation systems in software engineering. In: Robillard MP, Maalej W, Walker RJ, Zimmermann T (eds) Recommendation Systems in Software Engineering. Springer, Berlin, pp 275–300. https://doi.org/10.1007/978-3-642-45135-5-11

    Chapter  Google Scholar 

  • Smyth B, McClave P (2001) Similarity vs. diversity. In: Aha DW, Watson ID (eds) Case-Based Reasoning Research and Development, 4th international conference on case-based reasoning, ICCBR 2001, Vancouver, July 30 – August 2, 2001, Proceedings, Springer, Lecture Notes in Computer Science, vol 2080, pp 347–361, https://doi.org/10.1007/3-540-44593-5_25

    Chapter  Google Scholar 

  • Swearingen K, Sinha R (2001) Beyond algorithms: An HCI perspective on recommender systems. In: ACM SIGIR. Workshop on recommender systems, vol 13, no 5–6, pp 393–408

    Google Scholar 

  • Tkalcic M, Quercia D, Graf S (2016) Preface to the special issue on personality in personalized systems. User Model User-Adapt Interact 26(2–3):103–107. https://doi.org/10.1007/s11257-016-9175-9

    Article  Google Scholar 

  • Tomlinson S (2012) Measuring robustness with first relevant score in the TREC 2012 microblog track. In: Voorhees EM, Buckland LP (eds) Proceedings of the twenty-first text retrieval conference, TREC 2012, Gaithersburg, November 6–9, 2012, National Institute of Standards and Technology (NIST), vol Special Publication 500–298. http://trec.nist.gov/pubs/trec21/papers/OpenText.microblog.final.pdf

  • Vargas S (2015) Novelty and diversity evaluation and enhancement in recommender systems. PhD thesis, Universidad Autónoma de Madrid

    Google Scholar 

  • Vargas S, Castells P (2013) Exploiting the diversity of user preferences for recommendation. In: Ferreira J, Magalhães J, Calado P (eds) Open research areas in information retrieval, OAIR’13, Lisbon, May 15–17, 2013, ACM, pp 129–136. http://dl.acm.org/citation.cfm?id=2491776

  • Vargas S, Castells P (2014) Improving sales diversity by recommending users to items. In: Kobsa et al (2014), pp 145–152, https://doi.org/10.1145/2645710.2645744

  • Vargas S, Baltrunas L, Karatzoglou A, Castells P (2014) Coverage, redundancy and size-awareness in genre diversity for recommender systems. In: Kobsa et al (2014), pp 209–216, https://doi.org/10.1145/2645710.2645743

  • Yao YY (1995) Measuring retrieval effectiveness based on user preference of documents. JASIS 46(2):133–145. https://doi.org/10.1002/(SICI)1097-4571(199503)46:2<133::AID-ASI6>3.0.CO;2-Z

    Article  Google Scholar 

  • Zhao X, Niu Z, Chen W (2013) Opinion-based collaborative filtering to solve popularity bias in recommender systems. In: Decker H, Lhotská L, Link S, Basl J, Tjoa AM (eds) Database and expert systems applications – 24th international conference, DEXA 2013, Prague, August 26–29, 2013. Proceedings, Part II, Springer, Lecture Notes in Computer Science, vol 8056, pp 426–433, https://doi.org/10.1007/978-3-642-40173-2_35

    Google Scholar 

  • Zhao X, Zhang W, Wang J (2015) Risk-hedged venture capital investment recommendation. In: Werthner et al (2015), pp 75–82, https://doi.org/10.1145/2792838.2800181

  • Zhou T, Kuscsik Z, Liu JG, Medo M, Wakeling JR, Zhang YC (2010) Solving the apparent diversity-accuracy dilemma of recommender systems. Proc Natl Acad Sci 107(10):4511–4515. https://doi.org/10.1073/pnas.1000488107

    Article  Google Scholar 

  • Ziegler C, Lausen G (2009) Making product recommendations more diverse. IEEE Data Eng Bull 32(4):23–32. http://sites.computer.org/debull/A09dec/ziegler-paper1.pdf

  • Ziegler C, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification. In: Ellis A, Hagino T (eds) Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10–14, 2005, ACM, pp 22–32, https://doi.org/10.1145/1060745.1060754

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alejandro Bellogín .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Bellogín, A., Said, A. (2018). Recommender Systems Evaluation. In: Alhajj, R., Rokne, J. (eds) Encyclopedia of Social Network Analysis and Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-7131-2_110162

Download citation

Publish with us

Policies and ethics