A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

Verma, Vijay; Aggarwal, Rajesh Kumar

doi:10.1007/s13278-020-00660-9

A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

Original Article
Published: 09 June 2020

Volume 10, article number 43, (2020)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

2733 Accesses
39 Citations
Explore all metrics

Abstract

Jaccard index, originally proposed by Jaccard (Bull Soc Vaudoise Sci Nat 37:241–272, 1901), is a measure for examining the similarity (or dissimilarity) between two sample data objects. It is defined as the proportion of the intersection size to the union size of the two data samples. It provides a very simple and intuitive measure of similarity between data samples. This research examines the measures that are akin to the Jaccard index and may be used for modelling affinity between users (or items) in collaborative recommendations. Particularly, the measures such as simple matching coefficient (SMC), Sorensen–Dice coefficient (SDC), Salton’s cosine index (SCI), and overlap coefficient (OLC) are compared and analysed in both theoretical and empirical perspectives with respect to the Jaccard index. Since these measures apprehend only the structural similarity information (overlapping information) between the data samples, these are very useful in situations where only the associations between users and items are available such as browsing or buying behaviours of the users on an e-commerce portal (i.e. unary rating data, a special case of ratings). Furthermore, a theoretical relation among these measures has been established. We have also derived an equivalent expression for each of these measures so that it can be directly applied for binary data samples in data mining/machine learning jargon. In order to compare and validate the effectiveness of these structural similarity measures, several experiments have been conducted using standardized benchmark datasets (MovieLens, FilmTrust, Epinions, Yahoo! Movies, and Yahoo! Music). Empirically obtained results demonstrate that the Salton’s cosine index (SCI) provides better accuracy (in terms of MAE, RMSE, and precision) for large datasets, whereas the overlap coefficient (OLC) results in more accurate recommendations for small datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Deepjyoti Roy & Mala Dutta

Recommender Systems: Techniques, Applications, and Challenges

Advances in Collaborative Filtering

References

Aggarwal CC (2016) Recommender systems: the textbook, 1st edn. Springer, Berlin
Book Google Scholar
Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Inf Sci (NY) 178(1):37–51
Article Google Scholar
Al Hassanieh L, Jaoudeh CA, Abdo JB, Demerjian J (2018) Similarity measures for collaborative filtering recommender systems. In: 2018 IEEE Middle East North Africa communications conference MENACOMM, 2018, pp 1–5, 2018
Al-bashiri H, Abdulgabber MA, Romli A, Hujainah F (2017) Collaborative filtering similarity measures: revisiting. In: ACM international conference proceeding series, vol Part F1312, pp 195–200
Arsan T, Koksal E, Bozkus Z (2016) Comparison of collaborative filtering algorithms with various similarity measures for movie recommendation. Int J Comput Sci Eng Appl 6(3):1–20
Google Scholar
Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant Jaccard similarity. Inf Sci (NY) 483:53–64
Article Google Scholar
Balabanović M, Shoham Y (1997) Fab: content-based, collaborative recommendation. Commun ACM 40(3):66–72
Article Google Scholar
Billsus D, Pazzani MJ (1998) Learning collaborative information filters. In: Proceedings of the fifteenth international conference on machine learning, vol 54, p 48
Billsus D, Pazzani MJ (2002) User modeling for adaptative news access. User Model User Adapt Interact. 10:147–180
Article Google Scholar
Bobadilla J, Serradilla F, Bernal J (2010) A new collaborative filtering metric that improves the behavior of recommender systems. Knowl-Based Syst 23(6):520–528
Article Google Scholar
Bobadilla J, Ortega F, Hernando A, Arroyo Á (2012a) A balanced memory-based collaborative filtering similarity measure. Int J Intell Syst 27(10):939–946
Article Google Scholar
Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012b) Collaborative filtering based on significances. Inf Sci (NY) 185(1):1–17
Article Google Scholar
Bobadilla J, Ortega F, Hernando A (2012c) A collaborative filtering similarity measure based on singularities. Inf Process Manag 48(2):204–217
Article Google Scholar
Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on uncertainty in artificial intelligence, vol 461, no 8, pp 43–52
Burke R (2002) Hybrid recommender systems: survey and experiments. User Model User-Adapted Interact
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250
Article Google Scholar
Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302
Article Google Scholar
Ekstrand MD (2011) Collaborative filtering recommender systems. Found Trends Hum Comput Interact 4(2):81–173
Article Google Scholar
Epinions Trust Network Datasets. http://www.trustlet.org/epinions.html. Accessed 16 May 2020
Facebook. https://www.facebook.com/. Accessed 18 Jun 2019
Getoor L, Sahami M (1999) Using probabilistic relational models for collaborative filtering. Work. Web Usage Anal. User Profiling
Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to Weave an Information Tapestry. Commun ACM 35(12):61–70
Article Google Scholar
Guo G, Zhang J, Yorke-Smith N (2013) A novel Bayesian similarity measure for recommender systems. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp 2619–2625
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Harper FM, Konstan JA (2015) The MovieLens Datasets. ACM Trans Interact Intell Syst 5(4):1–19
Article Google Scholar
Herlocker JON, Riedl J (2002) An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf Retr Boston 2002:287–310
Article Google Scholar
Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval-SIGIR’99, 1999, pp 230–237
Hill W, Stead L, Rosenstein M, Furnas G (1995) Recommending and evaluating choices in a virtual community of use. In: Proceedings of the SIGCHI conference on Human factors in computing systems-CHI’95
Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: Proceedings of the 26th annual international ACM SIGIR conference on Res. Dev. information Retr. - SIGIR’03, p 259
Jaccard P (1901) Distribution comparée de la flore alpine dans quelques régions des Alpes occidentales et orientales. Bull Soc Vaudoise Sci Nat 37:241–272
Google Scholar
Joaquin D, Naohiro I (1999) Memory-based weighted-majority prediction for recommender systems. Res Dev Inf Retr
Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) GroupLens: applying collaborative filtering to Usenet news. Commun ACM 40(3):77–87
Article Google Scholar
Laghmari K, Marsala C, Ramdani M (2018) An adapted incremental graded multi-label classification model for recommendation systems. Prog Artif Intell 7(1):15–29
Article Google Scholar
Lang K (1995) NewsWeeder : learning to filter netnews (To appear in ML 95). In: Proceedings of the 12th international machine learning conference
Liu H, Hu Z, Mian A, Tian H, Zhu X (2014) A new user similarity model to improve the accuracy of collaborative filtering. Knowl-Based Syst 56:156–166
Article Google Scholar
Marlin B (2003) Modeling user rating profiles for collaborative filtering. In: Proceedings of the 16th international conference on neural information processing systems, 2003, pp 627–634
Massa P, Avesani P (2007) Trust-aware recommender systems. In: Proceedings of the 2007 ACM conference on recommender systems, 2007, pp 17–24
MovieLens|GroupLens. https://grouplens.org/datasets/movielens/. Accessed 22 Dec 2018
Nakamura A, Abe N (1998) Collaborative filtering using weighted majority prediction algorithms. In: Proceedings of the fifteenth international conference on machine learning, 1998, pp 395–403
Ortega F, Zhu B, Bobadilla J, Hernando A (2018) CF4J: collaborative filtering for Java. Knowl-Based Syst 152:94–99
Article Google Scholar
Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning Publications Co., Greenwich
Google Scholar
Patra BK, Launonen R, Ollikainen V, Nandi S (2015) A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowl-Based Syst 82:163–177
Article Google Scholar
Pavlov D, Pennock D (2002) A maximum entropy approach to collaborative filtering in dynamic, sparse, high-dimensional domains. Proc Neural Inf Process Syst 2002:1441–1448
Google Scholar
Resnick P, Varian HR (1997) Recommender systems 40(3)
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens : an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work, 1994, pp 175–186
Ricci F, Rokach L, Shapira B, Kantor PB (2010) Recommender systems handbook, 1st edn. Springer, Berlin
MATH Google Scholar
Salton G, McGill M (1983) Introduction to modem information, pp 375–384
Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the tenth international conference world wide web-WWW’01, pp 285–295
Science C, Wnek J (1997) Learning and revising user profiles: the identification of interesting web sites. Mach Learn 331:313–331
Google Scholar
Shardanand U, Maes P (1995) Social information filtering: algorithms for automating ‘word of mouth’. In: Proceedings of the SIGCHI conference on human factors in computing systems-CHI’95, pp 210–217
Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput Surv 47(1):1–45
Article Google Scholar
Sondur SD, Nayak S, Chigadani AP (2016) Similarity measures for recommender systems: a comparative study. Int J Sci Res Dev 2(3):76–80
Google Scholar
Sorensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species content. Det Kong Danske Vidensk Selesk Biol Skr 5(1):1–34
Google Scholar
Stephen SC, Xie H, Rai S (2017) Measures of similarity in memory-based collaborative filtering recommender system—a comparison. In: ACM international conference proceeding series, vol Part F1296, 2017
Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009(Section 3):1–19
Article Google Scholar
Suganeshwari G, Syed Ibrahim SP (2018) A comparison study on similarity measures in collaborative filtering algorithms for movie recommendation. Int J Pure Appl Math 119(15 Special Issue C):1495–1505
Google Scholar
Sun SB et al (2017) Integrating triangle and Jaccard similarities for recommendation. PLoS ONE 12(8):1–16
Google Scholar
Vijaymeena MK, Kavitha K (2016) A survey on similarity measures in text mining. Mach Learn Appl Int J 3(1):19–28
Google Scholar
Webscope |Yahoo Labs. https://webscope.sandbox.yahoo.com/. Accessed 16 May 2020

Download references

Acknowledgements

We would like to thank “anonymous” reviewers for the comments that considerably enhanced the manuscript. We are also grateful for their suggestions on the previous version of the manuscript.

Author information

Authors and Affiliations

Computer Engineering Department, National Institute of Technology, Kurukshetra, Haryana, 136119, India
Vijay Verma & Rajesh Kumar Aggarwal

Authors

Vijay Verma
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Kumar Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vijay Verma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, V., Aggarwal, R.K. A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective. Soc. Netw. Anal. Min. 10, 43 (2020). https://doi.org/10.1007/s13278-020-00660-9

Download citation

Received: 31 August 2019
Revised: 18 May 2020
Accepted: 26 May 2020
Published: 09 June 2020
DOI: https://doi.org/10.1007/s13278-020-00660-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

Abstract

Access this article

Similar content being viewed by others

A systematic review and research perspective on recommender systems

Recommender Systems: Techniques, Applications, and Challenges

Advances in Collaborative Filtering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective

Abstract

Access this article

Similar content being viewed by others

A systematic review and research perspective on recommender systems

Recommender Systems: Techniques, Applications, and Challenges

Advances in Collaborative Filtering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation