Abstract
Nowadays, the collaborative filtering becomes popular for recommendation systems. However, as the volume of data increases expansively, the construction of a similarity matrix becomes a performance bottleneck in recommendation systems. The MapReduce framework proposed by Google has been widely used for data-intensive application recently. Thus, in this work, we propose an efficient parallel algorithm ConSimMR for constructing a similarity matrix using MapReduce. We first partition a set of items into disjoint groups in each of which items rated by similar users tend to be located. We next compute the similarity of every pair of items belonging to the same group. Finally, we calculate the similarity of every item pair included in different groups. At this step, by using the rating list of each user rather than that of each item, we can compute the similarities in parallel resulting in the performance improvement. We conducted experiments to compare our parallel algorithm ConSimMR with the previous algorithms on real-life data sets and confirmed the efficiency as well as scalability of ConSimMR.











Similar content being viewed by others
References
Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749
Apache: Apache hadoop. http://hadoop.apache.org (2010). Accessed 1 June 2017
Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp 43–52
Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences 1997, IEEE, pp 21–29
Cohen E (1997) Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci 55(3):441–453
Das AS, Datar M, Garg A, Rajaram S (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th International Conference on World Wide Web, ACM, pp 271–280
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Delgado J, Ishii N (1999) Memory-based weighted majority prediction. In: ACM SIGIR Workshop Recommender Systems Citeseer
Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177
Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an information tapestry. Commun ACM 35(12):61–70
Indyk P (2001) A small approximately min-wise independent family of hash functions. J Algorithms 38(1):84–90
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, ACM, pp 604–613
Jiang J, Lu J, Zhang G, Long G (2011) Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. In: 2011 IEEE World Congress on Services, pp 490–497
Li C, He K (2017) CBMR: an optimized MapReduce for item-based collaborative filtering recommendation algorithm with empirical analysis. Concurr Comput Pract Exp 29:e4092. https://doi.org/10.1002/cpe.4092
Meng S, Dou W, Zhang X, Chen J (2014) KASR: a keyword-aware service recommendation method on mapreduce for big data applications. IEEE Trans Parallel Distrib Syst 25(12):3221–3231
Miller BN, Albert I, Lam SK, Konstan JA, Riedl J (2003) Movielens unplugged: experiences with an occasionally connected recommender system. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp 263–266
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp 175–186
Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, pp 285–295
Schelter S, Boden C, Markl V (2012) Scalable similarity-based neighborhood methods with MapReduce. In: Proceedings of the Sixth ACM Conference on Recommender Systems, pp 163–170
Shardanand U, Maes P (1995) Social information filtering: algorithms for automating word of mouth. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp 210–217
Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:4
Wang P, Ye H (2009) A personalized recommendation algorithm combining slope one scheme and user based collaborative filtering. In: Proceedings of the International Conference on Industrial and Information Systems, pp 152–154
Zhao ZD, Shang MS (2010) User-based collaborative-filtering recommendation algorithms on Hadoop. In: Proceedings of Third International Conference on Knowledge Discovery and Data Mining (WKDD), pp 478–481
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, S., Kim, H. & Min, JK. An efficient parallel similarity matrix construction on MapReduce for collaborative filtering. J Supercomput 75, 123–141 (2019). https://doi.org/10.1007/s11227-018-2271-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2271-3