Abstract
Collaborative Filtering (CF)-based recommender systems bring mutual benefits to both users and the operators of the sites with too much information. Users benefit as they are able to find items of interest from an unmanageable number of available items. On the other hand, e-commerce sites that employ recommender systems can increase sales revenue in at least two ways: a) by drawing customers’ attention to items that they are likely to buy, and b) by cross-selling items. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus far, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose ClustKnn, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but efficient clustering model. Recommendations are then generated quickly by using a simple Nearest Neighbor-based approach. We demonstrate the feasibility of ClustKnn both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, ClustKnn provides very good recommendation accuracy as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Basu, C., Hirsh, H., Cohen, W.: Recommendation as classification: using social and content-based information in recommendation. In: AAAI 1998. Proceedings of the 1998 National Conference on Artificial Intelligence, pp. 714–720 (1998)
Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: UAI 1998. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (July 1998)
Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proc. Ninth European Conference on Artificial Intelligence, pp. 147–149 (1990)
Chee, S.H.S., Han, J., Wang, K.: RecTree: An efficient collaborative filtering method. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, Springer, Heidelberg (2001)
Cleverdon, C., Mills, J., Keen, M.: Factors Determining the Performance of Indexing Systems: ASLIB Cranfield Research Project. Volume 1: Design. In: ASLIB Cranfield Research Project, Cranfield (1966)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: A constant time collaborative filtering algorithm. Inf.Retr. 4(2), 133–151 (2001); ID: 187
Herlocker, J., Konstan, J., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: SIGIR 1999. Proceedings of the 1999 Conference on Research and Development in Information Retrieval (August 1999)
Herlocker, J., Konstan, J., Terveen, L., Riedl, J.: Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1), 5–53 (2004)
Hofmann, T.: Probabilistic latent semantic analysis. In: UAI 1999. Proc. of Uncertainty in Artificial Intelligence, Stockholm (1999)
Hofmann, T.: Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22(1), 89–115 (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Kelleher, J., Bridge, D.: Rectree centroid: An accurate, scalable collaborative recommender. In: Cunningham, P., Fernando, T., Vogel, C. (eds.) Procs. of the Fourteenth Irish Conference on Artificial Intelligence and Cognitive Science, pp. 89–94 (2003)
Linden, G., Smith, B., York, J.: Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing 7(1), 76–80 (2003)
Marlin, B.: Modeling user rating profiles for collaborative filtering. In: NIPS (2003); crossref: DBLP:conf/nips/2003
Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. In: Eighteenth national conference on Artificial intelligence, American Association for Artificial Intelligence, pp. 187–192 (2002); ID: 179
Miller, B., Albert, I., Lam, S.K., Konstan, J.A., Riedl, J.: Movielens unplugged: Experiences with a recommender system on four mobile devices. In: HCI 2003. Proceedings of the 17th Annual Human-Computer Interaction Conference, British HCI Group, Miami, FL (September 2003)
Nasraoui, O., Pavuluri, M.: Complete this puzzle: A connectionist approach to accurate web recommendations based on a committee of predictors. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, Springer, Heidelberg (2006)
Pennock, D.M., Horvitz, E., Lawrence, S., Giles, C.L.: Collaborative filtering by personality diagnosis: A hybrid memory and model-based approach. In: UAI 2000. Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, Stanford, CA, pp. 473–480. Morgan Kaufmann Publishers Incl., San Francisco (2000)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: An open architecture for collaborative filtering of netnews. In: CSCW 1994. Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, Chapel Hill, North Carolina, United States, pp. 175–186. ACM Press, Chapel Hill, North Carolina, United States (1994)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1986)
Sarwar, B.M., Karypis, G., Konstan, J., Riedl, J.: Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In: ICCIT 2002. Fifth International Conference on Computer and Information Technology (2002)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: WWW 2001. Proceedings of the 10th International Conference on World Wide Web, Hong Kong, pp. 285–295. ACM Press, Hong Kong (2001)
Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Analysis of recommender algorithms for e-commerce. In: ACM E-Commerce 2000, pp. 158–167. ACM Press, New York (2000)
Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.: Application of dimensionality reduction in recommender system – a case study. In: ACM WebKDD 2000 Web Mining for E-Commerce Workshop, Boston, MA, USA, ACM Press, New York (2000)
Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: SIGIR 2002. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland, pp. 253–260. ACM Press, New York, NY, USA (2002)
Srebro, N., Jaakkola, T.: Weighted low rank approximation (2003)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques (2000)
Swearingen, K., Rashmi, S.: Interaction design for recommender systems. In: Designing Interactive Systems 2002, ACM Press, New York (2002)
Ungar, L., Foster, D.: Clustering methods for collaborative filtering. In: Proceedings of the Workshop on Recommendation Systems, AAAI Press, Menlo Park California (1998)
Xue, G.-R., Lin, C., Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., Chen, Z.: Scalable collaborative filtering using cluster-based smoothing. In: SIGIR 2005. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, Salvador, Brazil, pp. 114–121. ACM Press, New York, NY, USA (2005)
Yu, K., Xu, X., Tao, J., Ester, M., Kriegel, H.-P.: Instance selection techniques for memory-based collaborative filtering. In: SDM (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rashid, A.M., Lam, S.K., LaPitz, A., Karypis, G., Riedl, J. (2007). Towards a Scalable kNN CF Algorithm: Exploring Effective Applications of Clustering. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2006. Lecture Notes in Computer Science(), vol 4811. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77485-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-77485-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77484-6
Online ISBN: 978-3-540-77485-3
eBook Packages: Computer ScienceComputer Science (R0)