Skip to main content
Log in

A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aggarwal CC (2016) Recommender systems: the textbook, 1st edn. Springer, Berlin

    Book  Google Scholar 

  • Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Inf Sci (NY) 178(1):37–51

    Article  Google Scholar 

  • Al Hassanieh L, Jaoudeh CA, Abdo JB, Demerjian J (2018) Similarity measures for collaborative filtering recommender systems. In: 2018 IEEE Middle East North Africa communications conference MENACOMM, 2018, pp 1–5, 2018

  • Al-bashiri H, Abdulgabber MA, Romli A, Hujainah F (2017) Collaborative filtering similarity measures: revisiting. In: ACM international conference proceeding series, vol Part F1312, pp 195–200

  • Arsan T, Koksal E, Bozkus Z (2016) Comparison of collaborative filtering algorithms with various similarity measures for movie recommendation. Int J Comput Sci Eng Appl 6(3):1–20

    Google Scholar 

  • Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant Jaccard similarity. Inf Sci (NY) 483:53–64

    Article  Google Scholar 

  • Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Commun ACM 40(3):66–72

    Article  Google Scholar 

  • Billsus D, Pazzani MJ (1998) Learning collaborative information filters. In: Proceedings of the fifteenth international conference on machine learning, vol 54, p 48

  • Billsus D, Pazzani MJ (2002) User modeling for adaptative news access. User Model User Adapt Interact. 10:147–180

    Article  Google Scholar 

  • Bobadilla J, Serradilla F, Bernal J (2010) A new collaborative filtering metric that improves the behavior of recommender systems. Knowl-Based Syst 23(6):520–528

    Article  Google Scholar 

  • Bobadilla J, Ortega F, Hernando A, Arroyo Á (2012a) A balanced memory-based collaborative filtering similarity measure. Int J Intell Syst 27(10):939–946

    Article  Google Scholar 

  • Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012b) Collaborative filtering based on significances. Inf Sci (NY) 185(1):1–17

    Article  Google Scholar 

  • Bobadilla J, Ortega F, Hernando A (2012c) A collaborative filtering similarity measure based on singularities. Inf Process Manag 48(2):204–217

    Article  Google Scholar 

  • Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, vol 461, no 8, pp 43–52

  • Burke R (2002) Hybrid recommender systems: survey and experiments. User Model User-Adapted Interact

  • Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250

    Article  Google Scholar 

  • Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302

    Article  Google Scholar 

  • Ekstrand MD (2011) Collaborative filtering recommender systems. Found Trends Hum Comput Interact 4(2):81–173

    Article  Google Scholar 

  • Epinions Trust Network Datasets. http://www.trustlet.org/epinions.html. Accessed 16 May 2020

  • Facebook. https://www.facebook.com/. Accessed 18 Jun 2019

  • Getoor L, Sahami M (1999) Using probabilistic relational models for collaborative filtering. Work. Web Usage Anal. User Profiling

  • Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to Weave an Information Tapestry. Commun ACM 35(12):61–70

    Article  Google Scholar 

  • Guo G, Zhang J, Yorke-Smith N (2013) A novel Bayesian similarity measure for recommender systems. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp 2619–2625

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Harper FM, Konstan JA (2015) The MovieLens Datasets. ACM Trans Interact Intell Syst 5(4):1–19

    Article  Google Scholar 

  • Herlocker JON, Riedl J (2002) An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf Retr Boston 2002:287–310

    Article  Google Scholar 

  • Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval-SIGIR’99, 1999, pp 230–237

  • Hill W, Stead L, Rosenstein M, Furnas G (1995) Recommending and evaluating choices in a virtual community of use. In: Proceedings of the SIGCHI conference on Human factors in computing systems-CHI’95

  • Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on Res. Dev. information Retr. - SIGIR’03, p 259

  • Jaccard P (1901) Distribution comparée de la flore alpine dans quelques régions des Alpes occidentales et orientales. Bull Soc Vaudoise Sci Nat 37:241–272

    Google Scholar 

  • Joaquin D, Naohiro I (1999) Memory-based weighted-majority prediction for recommender systems. Res Dev Inf Retr

  • Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) GroupLens: applying collaborative filtering to Usenet news. Commun ACM 40(3):77–87

    Article  Google Scholar 

  • Laghmari K, Marsala C, Ramdani M (2018) An adapted incremental graded multi-label classification model for recommendation systems. Prog Artif Intell 7(1):15–29

    Article  Google Scholar 

  • Lang K (1995) NewsWeeder : learning to filter netnews (To appear in ML 95). In: Proceedings of the 12th international machine learning conference

  • Liu H, Hu Z, Mian A, Tian H, Zhu X (2014) A new user similarity model to improve the accuracy of collaborative filtering. Knowl-Based Syst 56:156–166

    Article  Google Scholar 

  • Marlin B (2003) Modeling user rating profiles for collaborative filtering. In: Proceedings of the 16th international conference on neural information processing systems, 2003, pp 627–634

  • Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on recommender systems, 2007, pp 17–24

  • MovieLens|GroupLens. https://grouplens.org/datasets/movielens/. Accessed 22 Dec 2018

  • Nakamura A, Abe N (1998) Collaborative filtering using weighted majority prediction algorithms. In: Proceedings of the fifteenth international conference on machine learning, 1998, pp 395–403

  • Ortega F, Zhu B, Bobadilla J, Hernando A (2018) CF4J: collaborative filtering for Java. Knowl-Based Syst 152:94–99

    Article  Google Scholar 

  • Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning Publications Co., Greenwich

    Google Scholar 

  • Patra BK, Launonen R, Ollikainen V, Nandi S (2015) A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl-Based Syst 82:163–177

    Article  Google Scholar 

  • Pavlov D, Pennock D (2002) A maximum entropy approach to collaborative filtering in dynamic, sparse, high-dimensional domains. Proc Neural Inf Process Syst 2002:1441–1448

    Google Scholar 

  • Resnick P, Varian HR (1997) Recommender systems 40(3)

  • Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens : an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work, 1994, pp 175–186

  • Ricci F, Rokach L, Shapira B, Kantor PB (2010) Recommender systems handbook, 1st edn. Springer, Berlin

    MATH  Google Scholar 

  • Salton G, McGill M (1983) Introduction to modem information, pp 375–384

  • Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the tenth international conference world wide web-WWW’01, pp 285–295

  • Science C, Wnek J (1997) Learning and revising user profiles: the identification of interesting web sites. Mach Learn 331:313–331

    Google Scholar 

  • Shardanand U, Maes P (1995) Social information filtering: algorithms for automating ‘word of mouth’. In: Proceedings of the SIGCHI conference on human factors in computing systems-CHI’95, pp 210–217

  • Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput Surv 47(1):1–45

    Article  Google Scholar 

  • Sondur SD, Nayak S, Chigadani AP (2016) Similarity measures for recommender systems: a comparative study. Int J Sci Res Dev 2(3):76–80

    Google Scholar 

  • Sorensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content. Det Kong Danske Vidensk Selesk Biol Skr 5(1):1–34

    Google Scholar 

  • Stephen SC, Xie H, Rai S (2017) Measures of similarity in memory-based collaborative filtering recommender system—a comparison. In: ACM international conference proceeding series, vol Part F1296, 2017

  • Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009(Section 3):1–19

    Article  Google Scholar 

  • Suganeshwari G, Syed Ibrahim SP (2018) A comparison study on similarity measures in collaborative filtering algorithms for movie recommendation. Int J Pure Appl Math 119(15 Special Issue C):1495–1505

    Google Scholar 

  • Sun SB et al (2017) Integrating triangle and Jaccard similarities for recommendation. PLoS ONE 12(8):1–16

    Google Scholar 

  • Vijaymeena MK, Kavitha K (2016) A survey on similarity measures in text mining. Mach Learn Appl Int J 3(1):19–28

    Google Scholar 

  • Webscope |Yahoo Labs. https://webscope.sandbox.yahoo.com/. Accessed 16 May 2020

Download references

Acknowledgements

We would like to thank “anonymous” reviewers for the comments that considerably enhanced the manuscript. We are also grateful for their suggestions on the previous version of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay Verma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verma, V., Aggarwal, R.K. A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective. Soc. Netw. Anal. Min. 10, 43 (2020). https://doi.org/10.1007/s13278-020-00660-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-020-00660-9

Keywords

Navigation