Skip to main content

Fingerprinting Ratings for Collaborative Filtering — Theoretical and Empirical Analysis

  • Conference paper
String Processing and Information Retrieval (SPIRE 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6393))

Included in the following conference series:

Abstract

We consider fingerprinting methods for collaborative filtering (CF) systems. In general, CF systems show their real strength when supplied with enormous data sets. Earlier work already suggests sketching techniques to handle massive amounts of information, but most prior analysis has so far been limited to non-ranking application scenarios and has focused mainly on a theoretical analysis. We demonstrate how to use fingerprinting methods to compute a family of rank correlation coefficients. Our methods allow identifying users who have similar rankings over a certain set of items, a problem that lies at the heart of CF applications. We show that our method allows approximating rank correlations with high accuracy and confidence. We examine the suggested methods empirically through a recommender system for the Netflix dataset, showing that the required fingerprint sizes are even smaller than the theoretical analysis suggests. We also explore the of use standard hash functions rather than min-wise independent hashes and the relation between the quality of the final recommendations and the fingerprint size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. JCSS 66 (2003)

    Google Scholar 

  2. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  3. Bachrach, Y., Betzler, N., Faliszewski, P.: Probabilistic possible winner determination. AAAI 38 (2010)

    Google Scholar 

  4. Bachrach, Y., Markakis, E., Resnick, E., Procaccia, A.D., Rosenschein, J.S., Saberi, A.: Approximating power indices: theoretical and empirical analysis. Autonomous Agents and Multi-Agent Systems 20(2), 105–122 (2010)

    Article  Google Scholar 

  5. Bachrach, Y., Parnes, A., Procaccia, A.D., Rosenschein, J.S.: Gossip-based aggregation of trust in decentralized reputation systems. Autonomous Agents and Multi-Agent Systems 19(2), 153–172 (2009)

    Article  Google Scholar 

  6. Bachrach, Y., Herbrich, R., Porat, E.: Sketching algorithms for approximating rank correlations in collaborative filtering systems. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 344–352. Springer, Heidelberg (2009)

    Google Scholar 

  7. Bachrach, Y., Porat, E., Rosenschein, J.S.: Sketching techniques for collaborative filtering. In: IJCAI 2009, Pasadena, California (July 2009)

    Google Scholar 

  8. Bell, R.M., Koren, Y.: Lessons from the netflix prize challenge. SIGKDD Explor. Newsl. 9(2), 75–79 (2007)

    Article  Google Scholar 

  9. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of UAI 1998, pp. 43–52. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  10. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. JCSS 60(3), 630–659 (2000)

    MathSciNet  MATH  Google Scholar 

  11. Clifford, R., Efremenko, K., Porat, E., Rothschild, A.: K-mismatch with don’t cares. In: Arge, L., Hoffmann, M., Welzl, E. (eds.) ESA 2007. LNCS, vol. 4698, pp. 151–162. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using Hamming norms. IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)

    Article  Google Scholar 

  13. Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An approximate L1-difference algorithm for massive data streams. SIAM J. Comput 32(1), 131–151 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hemaspaandra, E., Spakowski, H., Vogel, J.: The complexity of Kemeny elections. Theoretical Computer Science 349(3), 382–391 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Higgins, J.J.: An introduction to modern nonparametric statistics. Thomson Learning (2004)

    Google Scholar 

  16. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  17. Indyk, P.: A small approximately min-wise independent family of hash functions. Journal of Algorithms 38(1), 84–90 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  18. Jusang, A., Ismail, R., Boyd, C.: A survey of trust and reputation systems for online service provision. Decision Support Systems 43(2), 618–644 (2007)

    Article  Google Scholar 

  19. Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)

    Article  MATH  Google Scholar 

  20. Resnick, P., Iacovou, N., Suchak, M., Bergstorm, P., Riedl, J.: Grouplens: An open architecture for collaborative filtering of netnews. In: Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, Chapel Hill, North Carolina, pp. 175–186. ACM, New York (1994)

    Google Scholar 

  21. Rivest, R.L.: The md5 message-digest algorithm (rfc 1321)

    Google Scholar 

  22. Shardan, U., Maes, P.: Social information filtering: Algorithms for automating “word of mouth”. In: ACM CHI 1995, vol. 1, pp. 210–217 (1995)

    Google Scholar 

  23. Spearman, C.: The proof and measurement of association between two things 1904. The American Journal of Psychology 100(3-4), 441–471 (1987)

    Article  Google Scholar 

  24. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Processing Systems (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bachrach, Y., Herbrich, R. (2010). Fingerprinting Ratings for Collaborative Filtering — Theoretical and Empirical Analysis. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16321-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16320-3

  • Online ISBN: 978-3-642-16321-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics