Skip to main content

Towards a Scalable kNN CF Algorithm: Exploring Effective Applications of Clustering

  • Conference paper
Advances in Web Mining and Web Usage Analysis (WebKDD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4811))

Included in the following conference series:

Abstract

Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers’ attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus far, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose ClustKnn, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple Nearest Neighbor-based approach. We demonstrate the feasibility of ClustKnn both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, ClustKnn provides very good recommendation accuracy as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Basu, C., Hirsh, H., Cohen, W.: Recommendation as classification: using social and content-based information in recommendation. In: AAAI 1998. Proceedings of the 1998 National Conference on Artificial Intelligence, pp. 714–720 (1998)

    Google Scholar 

  2. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: UAI 1998. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (July 1998)

    Google Scholar 

  3. Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proc. Ninth European Conference on Artificial Intelligence, pp. 147–149 (1990)

    Google Scholar 

  4. Chee, S.H.S., Han, J., Wang, K.: RecTree: An efficient collaborative filtering method. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, Springer, Heidelberg (2001)

    Google Scholar 

  5. Cleverdon, C., Mills, J., Keen, M.: Factors Determining the Performance of Indexing Systems: ASLIB Cranfield Research Project. Volume 1: Design. In: ASLIB Cranfield Research Project, Cranfield (1966)

    Google Scholar 

  6. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  7. Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: A constant time collaborative filtering algorithm. Inf.Retr. 4(2), 133–151 (2001); ID: 187

    Article  MATH  Google Scholar 

  8. Herlocker, J., Konstan, J., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: SIGIR 1999. Proceedings of the 1999 Conference on Research and Development in Information Retrieval (August 1999)

    Google Scholar 

  9. Herlocker, J., Konstan, J., Terveen, L., Riedl, J.: Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1), 5–53 (2004)

    Article  Google Scholar 

  10. Hofmann, T.: Probabilistic latent semantic analysis. In: UAI 1999. Proc. of Uncertainty in Artificial Intelligence, Stockholm (1999)

    Google Scholar 

  11. Hofmann, T.: Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22(1), 89–115 (2004)

    Article  Google Scholar 

  12. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  13. Kelleher, J., Bridge, D.: Rectree centroid: An accurate, scalable collaborative recommender. In: Cunningham, P., Fernando, T., Vogel, C. (eds.) Procs. of the Fourteenth Irish Conference on Artificial Intelligence and Cognitive Science, pp. 89–94 (2003)

    Google Scholar 

  14. Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7(1), 76–80 (2003)

    Article  Google Scholar 

  15. Marlin, B.: Modeling user rating profiles for collaborative filtering. In: NIPS (2003); crossref: DBLP:conf/nips/2003

    Google Scholar 

  16. Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. In: Eighteenth national conference on Artificial intelligence, American Association for Artificial Intelligence, pp. 187–192 (2002); ID: 179

    Google Scholar 

  17. Miller, B., Albert, I., Lam, S.K., Konstan, J.A., Riedl, J.: Movielens unplugged: Experiences with a recommender system on four mobile devices. In: HCI 2003. Proceedings of the 17th Annual Human-Computer Interaction Conference, British HCI Group, Miami, FL (September 2003)

    Google Scholar 

  18. Nasraoui, O., Pavuluri, M.: Complete this puzzle: A connectionist approach to accurate web recommendations based on a committee of predictors. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Pennock, D.M., Horvitz, E., Lawrence, S., Giles, C.L.: Collaborative filtering by personality diagnosis: A hybrid memory and model-based approach. In: UAI 2000. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, Stanford, CA, pp. 473–480. Morgan Kaufmann Publishers Incl., San Francisco (2000)

    Google Scholar 

  20. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: An open architecture for collaborative filtering of netnews. In: CSCW 1994. Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, Chapel Hill, North Carolina, United States, pp. 175–186. ACM Press, Chapel Hill, North Carolina, United States (1994)

    Chapter  Google Scholar 

  21. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)

    Google Scholar 

  22. Sarwar, B.M., Karypis, G., Konstan, J., Riedl, J.: Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In: ICCIT 2002. Fifth International Conference on Computer and Information Technology (2002)

    Google Scholar 

  23. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: WWW 2001. Proceedings of the 10th International Conference on World Wide Web, Hong Kong, pp. 285–295. ACM Press, Hong Kong (2001)

    Chapter  Google Scholar 

  24. Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Analysis of recommender algorithms for e-commerce. In: ACM E-Commerce 2000, pp. 158–167. ACM Press, New York (2000)

    Chapter  Google Scholar 

  25. Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Application of dimensionality reduction in recommender system – a case study. In: ACM WebKDD 2000 Web Mining for E-Commerce Workshop, Boston, MA, USA, ACM Press, New York (2000)

    Google Scholar 

  26. Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: SIGIR 2002. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland, pp. 253–260. ACM Press, New York, NY, USA (2002)

    Chapter  Google Scholar 

  27. Srebro, N., Jaakkola, T.: Weighted low rank approximation (2003)

    Google Scholar 

  28. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques (2000)

    Google Scholar 

  29. Swearingen, K., Rashmi, S.: Interaction design for recommender systems. In: Designing Interactive Systems 2002, ACM Press, New York (2002)

    Google Scholar 

  30. Ungar, L., Foster, D.: Clustering methods for collaborative filtering. In: Proceedings of the Workshop on Recommendation Systems, AAAI Press, Menlo Park California (1998)

    Google Scholar 

  31. Xue, G.-R., Lin, C., Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., Chen, Z.: Scalable collaborative filtering using cluster-based smoothing. In: SIGIR 2005. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, Salvador, Brazil, pp. 114–121. ACM Press, New York, NY, USA (2005)

    Chapter  Google Scholar 

  32. Yu, K., Xu, X., Tao, J., Ester, M., Kriegel, H.-P.: Instance selection techniques for memory-based collaborative filtering. In: SDM (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Olfa Nasraoui Myra Spiliopoulou Jaideep Srivastava Bamshad Mobasher Brij Masand

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rashid, A.M., Lam, S.K., LaPitz, A., Karypis, G., Riedl, J. (2007). Towards a Scalable kNN CF Algorithm: Exploring Effective Applications of Clustering. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2006. Lecture Notes in Computer Science(), vol 4811. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77485-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77485-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77484-6

  • Online ISBN: 978-3-540-77485-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics