skip to main content
research-article

Impact of data characteristics on recommender systems performance

Published: 10 April 2012 Publication History

Abstract

This article investigates the impact of rating data characteristics on the performance of several popular recommendation algorithms, including user-based and item-based collaborative filtering, as well as matrix factorization. We focus on three groups of data characteristics: rating space, rating frequency distribution, and rating value distribution. A sampling procedure was employed to obtain different rating data subsamples with varying characteristics; recommendation algorithms were used to estimate the predictive accuracy for each sample; and linear regression-based models were used to uncover the relationships between data characteristics and recommendation accuracy. Experimental results on multiple rating datasets show the consistent and significant effects of several data characteristics on recommendation accuracy.

References

[1]
Adomavicius, G. and Tuzhilin, A. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 6, 734--749.
[2]
Ahn, H. J. 2008. A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Info. Sci. 178, 37--51.
[3]
Balabanovic, M. and Shoham, Y. 1997. Fab: Content-based, collaborative recommendation. Comm. ACM 40, 3, 66--72.
[4]
Banerjee, S. and Ramanathan, K. 2008. Collaborative filtering on skewed datasets. In Proceedings of the International World Wide Web Conference (WWW'08).
[5]
Bendel, R. B., Higgins, S. S., Teberg, J. E., and Pyke, D. A. 1989. Comparison of skewness coefficient, coefficient of variation, and Gini coefficient as inequality measures within populations. Oecologia 78, 3, 394--400.
[6]
Bennett, J. and Lanning, S. 2007. The Netflix prize. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining Workshop.
[7]
Billsus, D. and Pazzani, M. 1998. Learning collaborative information Filters. In Proceedings of the 15th International Conference on Machine Learning (ICML'98). 46--54.
[8]
Breese, J. S., Heckerman, D., and Kadie, C. 1998. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 43--52.
[9]
Delgado, J. and Ishii, N. 1999. Memory-based weighted-majority prediction for recommender systems. In Proceedings of the ACM SIGIR Workshop on Recommender Systems: Algorithms and Evaluation.
[10]
Deshpande, M. and Karypis, G. 2004. Item-based top-N recommendation algorithms. ACM Trans. Info. Syst. 22, 1, 143--177.
[11]
Dubin R. 1969. Theory Building. The Free Press, New York.
[12]
Flynn, L. J. 2006. Like this? You'll hate that. (Not all web recommendations are welcome.). New York Times (1/23/06).
[13]
Forster, M. R. and Sober, E. 1994. How to tell when simpler, more unified, or less ad-hoc theories will provide more accurate predictions. Brit. J. Philos. Sci. 45, 1, 1--35.
[14]
Funk, S. 2006. Netflix update: Try this at home. http://sifter.org/simon/journal/20061211.html. (Last accessed 11/06).
[15]
Gini, C. 1921. Measurement of inequality and incomes. Econ. J. 31, 124--126.
[16]
Goldberg. K., Roeder, T., Gupta, D., and Perkins, C. 2011. Eigentaste: A constant time collaborative filtering algorithm. Info. Retr. 4, 2, 133--151.
[17]
Good, N., Schafer, J. B., Konstan, J. A., Borchers, A., Sarwar, B., Herlocker, J., and Riedl, J. 1999. Combining collaborative filtering with personal agents for better recommendations. In Proceedings of the 16th National Conference on Artificial Intelligence. 439--446.
[18]
GroupLens. 2006. MovieLens datasets. http//www.grouplens.org/node/73. (Last accessed 1/12).
[19]
Hsu, C. N., Chung, H. H., and Huang, H. S. 2004. Mining skewed and sparse transaction data for personalized shopping recommendation. Mach. Learn. 57, 1, 35--59.
[20]
Huang, Z., Chen, H., and Zeng, D. 2004. Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Trans. Info. Syst. 22, 1, 116--142.
[21]
Huang, Z. and Zeng, D. 2011. Why does collaborative filtering work? Transaction-based recommendation model validation and selection by analyzing bipartite random graphs. J. Comput. 23, 1, 138--152.
[22]
Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl, J. 2000. An algorithmic framework for performing collaborative filtering. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 230--237.
[23]
Herlocker, J. L., Konstan, J. A., and Riedl, J. 2000. Explaining collaborative filtering recommendations. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. 241--250.
[24]
Herlocker, J. L., Konstan, J. A., Terveen, L. G., and Riedl, J. 2004. Evaluating collaborative filtering recommender systems. ACM Trans. Info. Syst. 2, 1, 5--53.
[25]
Karypis, G. 2001. Evaluation of item-based top-N recommendation algorithms. In Proceedings of the 10th International Conference on Information and Knowledge Management.
[26]
Koren, Y. 2009. The BellKor solution to the Netflix grand prize. http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf. (Last accessed 4/11).
[27]
Koren, Y. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering method. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. 426--434.
[28]
Koren, Y., Bell, R., and Volinsky, C. 2009. Matrix factorization techniques for recommender systems. IEEE Computer 42, 30--37.
[29]
Lam, X., Vu, T., Le, T., and Duong, A. 2008. Addressing cold-start problem in recommender systems. In Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication. 208--211.
[30]
Lin, M. C., Lee, A. J. T., Kao, R. T., and Chen, K. T. 2011. Stock price movement prediction using representative prototypes of financial reports. ACM Trans. Manag. Info. Syst. 2, 3.
[31]
Linden, G., Smith, B., and York, J. 2003. Amazon.com fecommendations: Item-to-item collaborative filtering. IEEE Computer 7, 2, 76--80.
[32]
Mitchell, T. 1997. Machine Learnning 1st Ed. McGraw-Hill Higher Education, New York, NY.
[33]
Nakamura, A. and Abe, N. 1998. Collaborative filtering using weighted majority prediction algorithms. In Proceedings of the 15th International Conference on Machine Learning.
[34]
Pareto, V. 1964. Cours d'Économie Politique Nouvelle Ed., G.-H. Bousquet et G. Busino, eds., Librairie Droz, Geneva, 299--345.
[35]
Park, S., Pennock, D., Madani, O., Good, N., and DeCoste, D. 2006. Naive filterbots for robust cold-start recommendations. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 699--705.
[36]
Paterek, A. 2007. Improving regularized singular value decomposition for collaborative filtering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 39--42.
[37]
Pazzani, M. J. 1999. A framework for collaborative, content-based and demographic filtering. Artif. Intell. Rev. 13, 5--6, 393--408.
[38]
Peng, J., Zeng, D., and Huang, Z. 2011. Latent subject-centered modeling of collaborative tagging: An application in social search. ACM Trans. Manag. Info. Syst. 2, 3.
[39]
Porter, J. 2006. Watch and learn: How recommendation systems are redefining the web. User interface engineering. http://www.uie.com/articles/recommendation_systems/. (Last accessed 4/11).
[40]
Resnick, P., Iakovou, N., Sushak, M., Bergstrom, P., and Riedl, J. 1994. GroupLens: An open architecture for collaborative filtering of Netnews. In Proceedings of the Computer Supported Cooperative Work Conference.
[41]
Roettgers, J. 2010. Warner Bros.-Netflix deal is all about the long tail. http://newteevee.com/2010/01/08/warner-bros-netflix-deal-is-all-about-the-long-tail. (Last accessed 1/10).
[42]
Salakhutdinov, R. and Mnih, A. 2008. Probabilistic matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems (NIPS'07), ACM, 1257--1264.
[43]
Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2000. Application of dimensionality reduction in recommender system—A case study. In Proceedings of the ACM WebKDD Web Mining for E-Commerce Workshop.
[44]
Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International World Wide Web Conference.
[45]
Sarwar, B., Konstan, J., Borchers, A., Herlocker, J., Miller, B., and Riedl, J. 1998. Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. 345--354.
[46]
Schein, A. I., Popescul, A., Ungar, L. H., and Pennock, D. M. 2002. Methods and metrics for cold-start recommendations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 253--260.
[47]
Schonfeld, E. 2007. Rethinking the online recommendation engine. CNNMoney. http://money.cnn.com/magazines/business2/business2_archive/2007/07/01/100117056/index.htm. (Last accessed 4/11).
[48]
Shannon, C. E. (July and October 1948). A mathematical theory of communication. Bell Syst. Tech. J. 27, 379--423 and 623--656.
[49]
Shardanand, U. and Maes, P. 1995. Social information filtering: Algorithms for automating ‘word of mouth’. In Proceedings of the Conference on Human Factors in Computing Systems.
[50]
Shmueli, G. and Koppius, O. 2011. Predictive analytics in information systems research. MIS Quart. 35, 3, 553--572.
[51]
Simpson, E. H. 1949. Measurement of diversity. Nature 163, 688--688.
[52]
Sun, M. 2010. How does variance of product ratings matter? http://ssrn.com/abstract=1400173.

Cited By

View all
  • (2025)Understanding the influence of data characteristics on the performance of point-of-interest recommendation algorithmsInformation Technology & Tourism10.1007/s40558-024-00304-0Online publication date: 3-Jan-2025
  • (2024)Characteristics of the Learning Data of a Session-Based Recommendation System and their Impact on the Performance of the SystemProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.24Online publication date: 2024
  • (2024)Product Soft Landing of Experience Products and the Role of Pre-release Advertising ResponsivenessSSRN Electronic Journal10.2139/ssrn.4713109Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Management Information Systems
ACM Transactions on Management Information Systems  Volume 3, Issue 1
April 2012
119 pages
ISSN:2158-656X
EISSN:2158-6578
DOI:10.1145/2151163
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 April 2012
Accepted: 01 January 2012
Revised: 01 October 2011
Received: 01 April 2011
Published in TMIS Volume 3, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Performance of recommender systems
  2. accuracy of recommendation algorithms
  3. collaborative filtering
  4. data characteristics

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)20
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Understanding the influence of data characteristics on the performance of point-of-interest recommendation algorithmsInformation Technology & Tourism10.1007/s40558-024-00304-0Online publication date: 3-Jan-2025
  • (2024)Characteristics of the Learning Data of a Session-Based Recommendation System and their Impact on the Performance of the SystemProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.24Online publication date: 2024
  • (2024)Product Soft Landing of Experience Products and the Role of Pre-release Advertising ResponsivenessSSRN Electronic Journal10.2139/ssrn.4713109Online publication date: 2024
  • (2024)Consumer Acquisition for Recommender SystemsInformation Systems Research10.1287/isre.2023.122935:1(339-362)Online publication date: 1-Mar-2024
  • (2024)Consumer Social Connectedness and Persuasiveness of Collaborative-Filtering Recommender Systems: Evidence From an Online-to-Offline Recommendation AppProduction and Operations Management10.1177/10591478241259422Online publication date: 25-Jul-2024
  • (2024)Recommender Systems Algorithm Selection for Ranking Prediction on Implicit Feedback DatasetsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691718(1163-1167)Online publication date: 8-Oct-2024
  • (2024)Informed Dataset Selection with ‘Algorithm Performance Spaces’Proceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691704(1085-1090)Online publication date: 8-Oct-2024
  • (2024)A Novel Evaluation Perspective on GNNs-based Recommender Systems through the Topology of the User-Item GraphProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688070(549-559)Online publication date: 8-Oct-2024
  • (2024)LEV4REC: A feature-based approach to engineering RSSEsJournal of Computer Languages10.1016/j.cola.2023.10125678(101256)Online publication date: Mar-2024
  • (2024)A survey on popularity bias in recommender systemsUser Modeling and User-Adapted Interaction10.1007/s11257-024-09406-034:5(1777-1834)Online publication date: 1-Nov-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media