Parallel matrix factorization for recommender systems

Yu, Hsiang-Fu; Hsieh, Cho-Jui; Si, Si; Dhillon, Inderjit S.

doi:10.1007/s10115-013-0682-2

Parallel matrix factorization for recommender systems

Regular Paper
Published: 01 September 2013

Volume 41, pages 793–819, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hsiang-Fu Yu¹,
Cho-Jui Hsieh¹,
Si Si¹ &
…
Inderjit S. Dhillon¹

2027 Accesses
81 Citations
Explore all metrics

Abstract

Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating least squares (ALS) and stochastic gradient descent (SGD) are two popular approaches to compute matrix factorization, and there has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been thoroughly explored. In this paper, we show that coordinate descent-based methods have a more efficient update rule compared to ALS and have faster and more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updates rank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, with a synthetic dataset containing 14.6 billion ratings, on a distributed memory cluster with 64 processors, to deliver the desired test RMSE, CCD++ is 49 times faster than SGD and 20 times faster than ALS. When the number of processors is increased to 256, CCD++ takes only 16 s and is still 40 times faster than SGD and 20 times faster than ALS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelized Preconditioned Model Building Algorithm for Matrix Factorization

Scalable Nonnegative Matrix Factorization with Block-wise Updates

Distributed Nonnegative Matrix Factorization with HALS Algorithm on MapReduce

Notes

http://mahout.apache.org/.
In [8], the name “Jellyfish” is used.
Intel MKL is used in our implementation of ALS.
We implement a multi-core version of DSGD according to [7].
HogWild is downloaded from http://research.cs.wisc.edu/hazy/victor/Hogwild/ and modified to start from the same initial point as ALS and DSGD.
In HogWild, seven cores are used for SGD updates, and one core is used for random shuffle.
for \(-\)s1, initial \(\eta =0.001\); for \(-\)s2, initial \(\eta =0.05\).
http://openmp.org/.
http://threadingbuildingblocks.org/.
http://www.mcs.anl.gov/research/projects/mpi/.
Our C implementation is 6x faster than the MATLAB version provided by [2].
\(\lambda \left( \sum _i |\Omega _{i}| \Vert \varvec{w}_{i}\Vert ^2 + \sum _j |\bar{\Omega }_{j}|\Vert \varvec{h}_{j}\Vert ^2\right) \) is used to replace the regularization term in (1).
http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide#compenv.
We downloaded version 2.1.4679 from https://code.google.com/p/graphlabapi/.

References

Dror G, Koenigstein N, Koren Y, Weimer M (2012) The Yahoo! music dataset and KDD-Cup’11. In: JMLR workshop and conference proceedings: proceedings of KDD Cup 2011 competition, vol. 18, pp 3–18
Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the Netflix prize. In: Proceedings of international conference on algorithmic aspects in, information and management
Koren Y, Bell RM, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42:30–37
Article Google Scholar
Takács G, Pilászy I, Németh B, Tikk D (2009) Scalable collaborative filtering approaches for large recommender systems. JMLR 10:623–656
Google Scholar
Chen P-L, Tsai C-T, Chen Y-N, Chou K-C, Li C-L, Tsai C-H, Wu K-W, Chou Y-C, Li C-Y, Lin W-S, Yu S-H, Chiu R-B, Lin C-Y, Wang C-C, Wang P-W, Su W-L, Wu C-H, Kuo T-T, McKenzie TG, Chang Y-H, Ferng C-S, Niv, Lin H-T, Lin C-J, Lin S-D (2012) A linear ensemble of individual and blended models for music. In: JMLR workshop and conference proceedings: proceedings of KDD cup 2011 competition, vol. 18, pp 21–60
Langford J, Smola A, Zinkevich M (2009) Slow learners are fast. In: NIPS
Gemulla R, Haas PJ, Nijkamp E, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: ACM KDD
Recht B, Re C, (2013) Parallel stochastic gradient algorithms for large-scale matrix completion. Math Program Comput 5(2): 201–226
Google Scholar
Zinkevich M, Weimer M, Smola A, Li L (2010) Parallelized stochastic gradient descent. In: NIPS
Niu F, Recht B, Re C, Wright SJ (2011) Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. In: NIPS
Cichocki A, Phan A-H (2009) Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans Fundam Electron Commun Comput Sci, vol. E92-A, no. 3, pp 708–721
Hsieh C-J, Dhillon IS (2011) Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: ACM KDD
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: proceedings of international conference on, computational statistics
Agarwal A, Duchi JC (2011) Distributed delayed stochastic optimization. In: NIPS
Bertsekas DP (1999) Nonlinear programming. Belmont, MA 02178–9998: Athena Scientific, second ed.
Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: ICML
Yu H-F, Huang F-L, Lin C-J (2011) Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn 85(1–2):41–75
Article MATH MathSciNet Google Scholar
Hsieh C-J, Sustik M, Dhillon IS, Ravikumar P (2011) Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS
Pilászy I, Zibriczky D, Tikk D (2010) Fast ALS-based matrix factorization for explicit and implicit feedback datasets. In: ACM RecSys
Bell RM, Koren Y, Volinsky C (2007) Modeling relationships at multiple scales to improve accuracy of large recommender systems. In: ACM KDD
Ho N-D, Blondel PVDVD (2011) Descent methods for nonnegative matrix factorization. In: numerical linear algebra in signals, systems and control. Springer: Netherlands, SA, pp 251–293
Thakur R, Gropp W (2003) Improving the performance of collective operations in MPICH. In: proceedings of European PVM/MPI users’ group meeting
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. CoRR, vol. abs/1006.4990
Chung F, Lu L, Vu V (2003) The spectra of random graphs with given expected degrees. Intern Math, 1(3): 257–275
Google Scholar
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8): 716–727
Google Scholar
Yuan G-X, Chang K-W, Hsieh C-J, Lin C-J (2010) A comparison of optimization methods and software for large-scale l1-regularized linear classification. J Mach Learn Res 11:3183–3234
MATH MathSciNet Google Scholar

Download references

Acknowledgments

This research was supported by NSF Grants CCF-0916309, CCF-1117055 and DOD Army Grant W911NF-10-1-0529. We also thank the Texas Advanced Computer Center (TACC) for providing computing resources required to conduct experiments in this work.

Author information

Authors and Affiliations

Department of Computer Science, The University of Texas at Austin, Austin, TX, 78712, USA
Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si & Inderjit S. Dhillon

Authors

Hsiang-Fu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Cho-Jui Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Si Si
View author publications
You can also search for this author in PubMed Google Scholar
Inderjit S. Dhillon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hsiang-Fu Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, HF., Hsieh, CJ., Si, S. et al. Parallel matrix factorization for recommender systems. Knowl Inf Syst 41, 793–819 (2014). https://doi.org/10.1007/s10115-013-0682-2

Download citation

Received: 01 February 2013
Revised: 19 July 2013
Accepted: 17 August 2013
Published: 01 September 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10115-013-0682-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel matrix factorization for recommender systems

Abstract

Access this article

Similar content being viewed by others

Parallelized Preconditioned Model Building Algorithm for Matrix Factorization

Scalable Nonnegative Matrix Factorization with Block-wise Updates

Distributed Nonnegative Matrix Factorization with HALS Algorithm on MapReduce

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel matrix factorization for recommender systems

Abstract

Access this article

Similar content being viewed by others

Parallelized Preconditioned Model Building Algorithm for Matrix Factorization

Scalable Nonnegative Matrix Factorization with Block-wise Updates

Distributed Nonnegative Matrix Factorization with HALS Algorithm on MapReduce

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation