Skip to main content

Data Sparsity Issues in the Collaborative Filtering Framework

  • Conference paper
Advances in Web Mining and Web Usage Analysis (WebKDD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4198))

Included in the following conference series:

Abstract

With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have been developed. In this chapter we present one of such techniques – collaborative filtering. Apart from giving an overview of collaborative filtering approaches, we present the experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While the k-Nearest Neighbor algorithm is usually used for collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the sparsity of available data. Furthermore, we show that kNN is dominant on datasets with relatively low sparsity while SVM-based approaches may perform better on highly sparse data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Wolf, J.L., Wu, K.-L., Yu, P.S.: Horting hatches an egg: A new graph-theoretic approach to collaborative filtering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)

    Google Scholar 

  2. Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web: Probabilistic Methods and Algorithms. Wiley, New York (2003)

    Google Scholar 

  3. Billsus, D., Pazzani, M.J.: Learning collaborative information filers. In: Proceedings of the 15th International Conference on Machine Learning (1998)

    Google Scholar 

  4. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (1998)

    Google Scholar 

  5. Chang, C.-C., Lin, C.-J.: LibSvm: A Library for Support Vector Machines (2001), Software available at http://www.csie.ntu.edu.tw/

  6. Chickering, D.M., Heckerman, D., Meek, C.: A bayesian approach to learning bayesian networks with local structure. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (1997)

    Google Scholar 

  7. Claypool, M., Le, P., Wased, M., Brown, D.: Implicit interest indicators. In: Proceedings of ACM 2001 Intelligent User Interfaces Conference (2001)

    Google Scholar 

  8. Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  9. Goldberg, K., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval (4), 133–151 (2001)

    Google Scholar 

  10. Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1), 5–53 (2004)

    Article  Google Scholar 

  11. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (1999)

    Google Scholar 

  12. Hofmann, T.: Latent semantic models for collaborative filtering. ACM Transactions on Information Systems 22(1), 89–115 (2004)

    Article  Google Scholar 

  13. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: Grouplens: Applying collaborative filtering to usenet news. Communications of the ACM 40(3), 77–87 (1997)

    Article  Google Scholar 

  14. Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. In: Proceedings of the 18th National Conference on Artificial Intelligence (2002)

    Google Scholar 

  15. Resnick, P., Iaocvou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: An open architecture for collaborative filtering for netnews. In: Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175–186 (1994)

    Google Scholar 

  16. Rosenstein, M., Lochbaum, C.: What is actually taking place on web sites: Ecommerce lessons from web server logs. In: Proceedings of ACM 2000 Conference on Electronic Commerce (2000)

    Google Scholar 

  17. Sarwar, B., Karypis, G., Konstan, J., Reidl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web (2001)

    Google Scholar 

  18. Yu, K., Xu, X., Ester, M., Kriegel, H.-P.: Selecting relevant instances for efficient and accurate collaborative filtering. In: Proceedings of the 10th International Conference on Information and Knowledge Management (2001)

    Google Scholar 

  19. Zeng, C., Xing, C.-X., Zhou, L.-Z.: Similarity measure and instance selection for collaborative filtering. In: Proceedings of the 12th International World Wide Web Conference (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grčar, M., Mladenič, D., Fortuna, B., Grobelnik, M. (2006). Data Sparsity Issues in the Collaborative Filtering Framework. In: Nasraoui, O., Zaïane, O., Spiliopoulou, M., Mobasher, B., Masand, B., Yu, P.S. (eds) Advances in Web Mining and Web Usage Analysis. WebKDD 2005. Lecture Notes in Computer Science(), vol 4198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11891321_4

Download citation

  • DOI: https://doi.org/10.1007/11891321_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46346-7

  • Online ISBN: 978-3-540-46348-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics