Abstract
Collaborative Filtering (CF) is an important technique for recommendation systems which model and analyzes the preferences of customers for giving reasonable advices. Recently, many applications based on Restricted Boltzmann Machine (RBM) have been developed for a large variety of learning problems. RBM-based model for Collaborative Filtering (RBM-CF) is able to deal with large scale data sets and obtains good recommendation performance. However, the computation of RBM becomes problematic when using large number of hidden features to improve the recommendation accuracy. Although RBM has great potential for parallelism, it is still a challenge to develop a parallel implementation of RBM-CF on GPU, since the data sets for CF are always large and sparse. In this paper, we propose a parallel implementation of RBM-CF on GPU using CUDA. We first present how to transform the computation of RBM-CF into matrix-based operation on GPU, and three CUDA kernels for sparse matrix-matrix multiplication to further improve the computational efficiency of RBM-CF for modeling large scale and sparse data sets. Experimental results show that significant speedups are achieved by our parallel implementation on GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1, 194–281 (1986)
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, pp. 791–798. ACM (2007)
Salakhutdinov, R., Hinton, G.: Deep boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 448–455 (2009)
Ranzato, M., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 792–799. ACM (2008)
Ly, D., Paprotski, V., Yen, D.: Neural networks on gpus: Restricted boltzmann machines. Tech. rep., Technical Report, Department of Electrical and Computer Engineering, University of Toronto (2008)
McAfee, L.: Design and analysis of blas, gpu, and sparse multithreaded acceleration methods for restricted b oltzmann machine training
Raina, R., Madhavan, A., Ng, A.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 873–880. ACM (2009)
Kim, S., McAfee, L., McMahon, P., Olukotun, K.: A highly scalable restricted boltzmann machine FPGA implementation. In: International Conference on Field Programmable Logic and Applications, FPL 2009, pp. 367–372. IEEE (2009)
Kim, S., McMahon, P., Olukotun, K.: A large-scale architecture for restricted boltzmann machines. In: 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 201–208. IEEE (2010)
Ly, D., Chow, P.: A high-performance FPGA architecture for restricted boltzmann machines. In: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 73–82. ACM (2009)
Ly, D., Chow, P.: A multi-fpga architecture for stochastic restricted boltzmann machines. In: International Conference on Field Programmable Logic and Applications, FPL 2009, pp. 168–173. IEEE (2009)
Le Ly, D., Chow, P.: High-performance reconfigurable hardware architecture for restricted boltzmann machines. IEEE Transactions on Neural Networks 21(11), 1780–1792 (2010)
Lekakos, G., Giaglis, G.: Improving the prediction accuracy of recommendation algorithms. Approaches Anchored on Human Factors. Interacting with Computers 18(3), 410–431 (2006)
Roh, T., Oh, K., Han, I.: The collaborative filtering recommendation based on som cluster-indexing cbr. Expert Systems with Applications 25(3), 413–423 (2003)
Shih, Y., Liu, D.: Product recommendation approaches: Collaborative filtering via customer lifetime value and customer demands. Expert Systems with Applications 35(1), 350–360 (2008)
Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)
Nvidia, C.: Compute unified device architecture programming guide, vol. 83, p. 129. NVIDIA, Santa Clara (2007)
Nvidia, C.: Cublas library, vol. 15. NVIDIA Corporation, Santa Clara (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cai, X., Xu, Z., Lai, G., Wu, C., Lin, X. (2012). GPU-Accelerated Restricted Boltzmann Machine for Collaborative Filtering. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-33078-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33077-3
Online ISBN: 978-3-642-33078-0
eBook Packages: Computer ScienceComputer Science (R0)