Abstract
Neighborhood-based collaborative filtering (CF) methods are widely used in recommender systems because they are easy-to-implement and highly effective. One of the significant challenges of these methods is the ability to scale with the increasing amount of data since finding nearest neighbors requires a search over all of the data. Approximate nearest neighbor (ANN) methods eliminate this exhaustive search by only looking at the data points that are likely to be similar. Locality sensitive hashing (LSH) is a well-known technique for ANN search in high dimensional spaces. It is also effective in solving the scalability problem of neighborhood-based CF. In this study, we provide novel improvements to the current LSH based recommender algorithms and make a systematic evaluation of LSH in neighborhood-based CF. Besides, we make extensive experiments on real-life datasets to investigate various parameters of LSH and their effects on multiple metrics used to evaluate recommender systems. Our proposed algorithms have better running time performance than the standard LSH-based applications while preserving the prediction accuracy in reasonable limits. Also, the proposed algorithms have a large positive impact on aggregate diversity which has recently become an important evaluation measure for recommender algorithms.
Similar content being viewed by others
Notes
sparsity of a dataset = #ratings/(#items ∗ #users)
References
Adomavicius, G., & Kwon, Y. (2012). Improving aggregate recommendation diversity using Ranking-Based techniques. IEEE Transactions on Knowledge and Data Engineering, 24(5), 896–911.
Anand, R., & Jeffrey David, U. (2011). Mining of massive datasets, (pp. 73–126). New York: Cambridge University Press.
Andoni, A., & Indyk, P. (2008). Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1), 117–122.
Aytekin, T., & Karakaya, M.O. (2014). Clustering-based diversity improvement in recommendation. Journal of Intelligent Information System, 42(1), 1–18.
Bahmani, B., Goel, A., Shinde, R. (2012). Efficient distributed locality sensitive hashing. In 21st ACM international conference on information and knowledge management, CIKM’12, Maui, HI, USA (pp. 2174–2178). ACM.
Billsus, D., & Pazzani, M.G. (1998). Learning collaborative information filters. In Proceedings of the fifteenth international conference on machine learning (ICML 1998), Madison, Wisconsin, USA (pp. 46–54).
Cacheda, F., Carneiro, V., Fernȧndez, D., Formoso, V. (2011). Comparison of collaborative filtering algorithms: limitations of current techniques and proposals for scalable, high-performance recommender systems. TWEB, 5(1), 2.
Charikar, M. (2002). Similarity estimation techniques from rounding algorithms. In Proceedings on 34th annual ACM symposium on theory of computing, Montréal, Québec, Canada (pp. 380–388). ACM.
Das, A., Datar, M., Garg, A., Rajaram, S.S. (2007). Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on world wide web, WWW 2007, Banff, Alberta, Canada (pp. 271–280).
Deshpande, M., & Karypis, G. (2004). Item-based top-N recommendation algorithms. ACM Transactions on Information Systems, 22(1), 143–177.
Desrosiers, C., & Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods, recommender systems handbook, (pp. 107–144). Berlin: Springer.
Ekstrand, M.D., Riedl, J., Konstan, J.A. (2011). Collaborative filtering recommender systems. Foundations and Trends in Human-Computer Interaction, 4(2), 175–243.
Gionis, A., Indyk, P., Motwani, R. (1999). Similarity search in high dimensions via hashing. In VLDB’99, Proceedings of 25th international conference on very large data bases, Edinburgh, Scotland, UK (pp. 518–529).
Gong, S. (2010). A collaborative filtering recommendation algorithm based on user clustering and item clustering. JSW, 5(7), 745–752.
Herlocker, J.L., Konstan, J.A., Riedl, J. (2002). An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Information Retrieval, 5(4), 287–310.
Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems, 22(1), 5–53.
Huizhi, L., Haoran, D., Qing, W. (2014). Real-time collaborative filtering recommender systems. In Proceedings of the 12nd Australasian data mining conference (AusDM).
Jiang, J., Lu, J., Zhang, G., Long, G. (2011). Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. In World congress on services, SERVICES 2011, Washington, DC, USA (pp. 490–497).
Karypis, G. (2001). Evaluation of item-based top-N recommendation algorithms. In Proceedings of the 2001 ACM CIKM international conference on information and knowledge management, Atlanta, Georgia, USA, November 5-10 (pp. 247–254).
Kannan, R., Ishteva, M., Park, H. (2014). Bounded matrix factorization for recommender system. Knowledge and Information Systems, 39(3), 491–511.
Koga, H., Ishibashi, T., Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge and Information Systems, 12(1), 25–53.
McAuley, J.J., Targett, C., Shi, Q., Hengel, A.V.D. (2015). Image-based recommendations on styles and substitutes. CoRR, arXiv:1506.04757.
Pazzani, M.J., & Billsus, D. (2007). Content-based recommendation systems. In The adaptive web (pp. 325–341). Springer.
Rashid, A.M., Lam, S.K., LaPitz, A., Karypis, G., Riedl, J. (2006). Towards a scalable k NN CF algorithm: exploring effective applications of clustering. In Advances in web mining and web usage analysis, WebKDD 2006, Philadelphia, PA, USA (pp. 147–166).
Shani, G., & Gunawardana, A. (2011). Evaluating recommendation systems, recommender systems handbook, (pp. 257–297). Berlin: Springer.
Suchal, J., & Nȧvrat P. (2010). Full text search engine as scalable k-nearest neighbor recommendation system. In Artificial intelligence in theory and practice III - third IFIP TC 12 international conference on artificial intelligence, IFIP AI 2010, Brisbane, Australia (pp. 165–173).
Yu, H.F., Hsieh, C.J., Si, S., Dhillon, I.S. (2012). Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In 12th IEEE international conference on data mining, ICDM 2012, Brussels, Belgium (pp. 765–774).
Zhang, YC, Ó Séaghdha, D, Quercia, D, Jambor, T. (2012). Auralist: introducing serendipity into music recommendation. In Proceedings of the fifth international conference on web search and web data mining, WSDM 2012, Seattle, WA, USA (pp. 13–22). ACM.
Zhao, X., Niu, Z., Chen, W., Shi, C., Niu, K., Liu, D. (2015). A hybrid approach of topic model and matrix factorization based on two-step recommendation framework. Journal of Intelligent Information System, 44(3), 335–353.
Zhou, T., Kuscsik, Z., Liu, J.G., Medo, M., Wakeling, J.R., Zhang, Y.C. (2010). Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10), 4511–4515.
Acknowledgements
This research was supported by Central Securities Depository Institution (MKK) of Turkish Capital Markets.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aytekin, A.M., Aytekin, T. Real-time recommendation with locality sensitive hashing. J Intell Inf Syst 53, 1–26 (2019). https://doi.org/10.1007/s10844-019-00552-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-019-00552-1