Elsevier

Neurocomputing

Volume 97, 15 November 2012, Pages 52-62
Neurocomputing

Inducible regularization for low-rank matrix factorizations for collaborative filtering

https://doi.org/10.1016/j.neucom.2012.05.010Get rights and content

Abstract

Low-rank matrix factorization with missing data has become an effective methodology for collaborative filtering applications since it can generate high quality rating predictions for recommendation systems. The performance of low-rank factorization, however, critically depends on how the low-rank model is regularized in order to mitigate the over-fitting problem to the observed data. The objective of this paper is to propose a novel regularization technique which we call inducible regularization. It utilizes pre-estimated ratings on a pre-specified subset of the ratings to regularize the solutions of low-rank matrix factorization. We develop two algorithms for solving the new regularized problem via alternating least squares iterations and stochastic gradient descent. We also devise a fast implementation of the alternating least squares algorithm which is suitable for parallel computing. Numerical experiments on three real-world data sets MovieLens, Jester, and EachMovie are given for comparing the proposed algorithms with existing algorithms ALS, SGD, and SVD++ that solve low-rank matrix factorization with classical regularizations, illustrating superior performance of our proposed algorithms.

Introduction

Collaborative filtering (CF) has become a dominant technique for recommendation systems with wide-spread deployment for the recommendation of movies, music, news articles, and products by internet companies such as Amazon, Netflix, Yahoo! and Google [1], [2], [3], [9], [23]. A popular methodology for solving the CF problem relies on regularized low-rank matrix factorizations; iterative methods – based on alternating least squares or gradient descent techniques – have been proposed for the resulting optimization problem [12], [22], [7], [21], giving rise to attractive prediction accuracy and reasonable scalability for large datasets.

We start our discussion with a brief presentation of the CF problem. Consider a rating matrix R of m users on n items that is incomplete, i.e., only a subset of items iIu are rated by some of the users u and many others are unknown. Let K be the set of all index-pair (u,i) of known entries in the rating matrix, i.e., K={(u,i):iIu,u=1,2,,m}.The goal of CF is to estimate the unknown ratings ru,i based on the known ratings ru,i, (u,i)K. In low-rank matrix factorization approaches for CF, the ratings are modeled in the following form r^ui=puTqi for all users on all terms, where pu and qi are column vectors of a pre-specified dimension k—much smaller than m and n generally, thus the name low-rank matrix factorization. The goal is then to minimize the mean squared errors (MSE): ϵK^=1|K^|(u,i)K^(ru,ipuTqi)2of the low-rank estimation for the whole or a pre-specified subset of the unknown ratings. Here, K^ is the index set of those unknown ratings that are of interest. A general approach that aims at achieving this goal starts with reducing the MSE of the low-rank approximation on the observed known ratings: ϵK=1|K|(u,i)K(ru,ipuTqi)2.

It is pointed out in [24] that two different approximations with equal low rank can achieve the minimum MSE of the known entries but the estimations for the unknowns can be very different. Additionally, it has long been recognized that regularization of low-rank matrix factorization is necessary to obtain good performance. One regularization scheme that has been commonly used in the literature is to penalize the norms pu2 and qi2 of the factors in the low-rank approximation in Eq. (1). This particular regularization can reduce the errors of the estimates to the unobserved ratings if the regularization parameter is suitably chosen. However, it is difficult to choose the right regularization parameters and the improvement produced by the regularization is also limited. This thorny issue motivates several recent methods for improving the regularized low-rank matrix factorization methods including a probabilistic method proposed in [16], [15], where Type II MLE and hierarchical Bayesian methods aided by Gibbs sampling were used to estimate the low-rank factors, providing a principled approach for addressing the issue of selecting the regularization parameters.

For optimizing the objective function resulted from the regularized low-rank matrix factorization, alternating least squares (ALS) is a commonly used approach with linear convergence for solving the CF problem—its efficiency though is rather limited [7]. It is frequently occurred that, for the original regularization model, approximation errors of ALS to unknown ratings increase if we decrease the errors to observed entries by tuning the regularization parameters; i.e., the data are over-fitted by the low-rank approximation via the ALS for the classical regularization problem. Similar issues also arise from gradient descent methods such as the stochastic gradient descending (SGD) algorithm—they are more efficient than the alternative methods for solving large-scale problems although they tend to converge slowly.

From a geometric point of view, the original regularization in terms of the norms of the factors looks for solutions with balanced low-rank factors P and Q that are shrinked towards zero. From a statistical viewpoint, the regularization scheme may also help to obtain a solution with balanced covariances for its two factors. Such constraints make sense if the true factor entries are normally distributed with zero mean and small covariance. However, it is generally not the case in the CF problem.

In this paper, we will propose a new regularization method for the low-rank matrix factorization in CF. Distinct from the classical regularized models, the new regularization is induced by pre-estimated ratings for a subset of the unknown ratings. The main motivation is that a good pre-estimation to the unknown ratings is helpful to guide the iterative solutions to the true ratings. Another key difference is that we do not impose restrictions on the factors of the low-rank approximation, but rather on the whole approximation matrix itself. It is possible that the resulting solutions may depend on the accuracy of pre-estimation to some extent. However, as shown in our numerical experiments, with a robust estimating on unknown ratings, this new regularization method can reduce the negative effects resulting from possible errors of the pre-estimates. A rough estimation with rank much smaller than the desired rank of the matrix factorization can serve as a robust choice for the pre-estimation.

We will consider two versions of the inducible regularization model, depending on the size of the subset of unknowns that we intend to estimate, and correspondingly, we will provide two algorithms for solving the new regularization problems. One is an inducible SGD (ISGD) that successively uses separated gradients of the objective function, if the subset of pre-estimated unknown ratings is as sparse as K. The other one is an inducible ALS (IALS) if the whole set of unknown ratings is used in pre-estimation. We will exploit the structure of the low-rank approximations and propose a fast implementation of IALS. Compared with the original regularization, the new regularization model has three notable properties:

  • 1.

    It can meliorate the convergent behaviors and reduce the over-fitting of ALS with the original regularization model. The algorithm IALS can smoothly solve the new regularized problem with monotone convergence.

  • 2.

    It can improve the efficiency of gradient descent methods. ISGD with the inducible regularization outperforms SGD for low-rank factorization using the classical regularization in all the numerical experiments we carried out with a range of settings of ranks for the low-rank factorizations and sizes of the data.

  • 3.

    Compared with SVD++, ISGD uses a simple user factor in the low-rank factorization and converges faster. SVD++ needs additional gradient descents for updating the newly added factors, doubling the cost of computation than ISGD.

The rest of this paper is organized as follows. In Section 2, we will briefly summarize the previous models on the CF problem and their corresponding algorithms. Our inducible regularization model and the algorithms ISGD and IALS will be given in Section 3, together with some simple pre-estimation schemes using rank-one or rank-two approximations. These simple methods can be used to inexpensively obtain the pre-estimations used for regularization. We will also give a fast implementation of IALS in Section 5. Experiments are reported in Section 7 to show the numerical behaviors of the proposed algorithms for solving the inducible regularization model. We will give detailed comparisons of IALS and ISGD with the algorithms ALS, SGD, and SVD++ that use the classical regularizations on three real-world data sets.

Section snippets

Problem formulation and algorithms

Assume that we are given partial rating values ru,i of user u on item i for the index set (u,i)K. The baseline estimation of ratings is postulated by the model: r^u,i=r¯+au+bi,where r¯ is the average value of known ratings, and the parameters au and bi denotes the user-bias and item-bias of each user and item to the average rating, respectively. These two kinds of bias can be estimated by solving the following regularized least squares problem: mina,b12(u,i)K(ru,ir¯aubi)2+λuau2+ibi2.

Inducible regularization

In this section, we consider a novel regularization model of low-rank approximation for the CF problem, which can help to obtain more reasonable solutions. The motivation is that if one has a rough pre-estimation on the unknown ratings, rather than forcing the factors to shrink to zero as in the original regularization model – which is a kind of arbitrary anyway – we let the predictions given by the factors be close to the pre-estimation. That is, the pre-estimation could be taken as an “anchor

Stochastic gradient descent Algorithm

For a sparse set K^, we propose to employ an idea similar to the stochastic gradient method for solving the inducible model (9). The difference is that we run the loop on both K and K^. That is, the whole sweep (one iteration) is now given by(u,i)K:eui=ruipuTqi;pupu+γ(euiqiλpu);qiqi+γ(eijpuλqi);(u,i)K^:e^ui=r^uipuTqi;pupu+μγ(e^uiqiλpu);qiqi+μγ(e^ijpuλqi).

The cost of above iteration is O(k|K|+k|K^|) with the approximation rank k. Clearly, this iteration scheme is as fast as SGD if K

Alternating iteration and its fast implementation

Let R=(ru,i) be the rating matrix and R^ be an inducing matrix that approximates the partial entries of R corresponding to K^. The optimization (8) can be represented in the weighted form:minp,q12(u,i)Kwu,i(r˜u,ipuTqi)2with K=KK^ and wu,i=1,(u,i)K;μ,(u,i)K^,r˜u,i=ru,i,(u,i)K;r^u,i,(u,i)K^.The prediction matrix R^=cE+P0TQ0 with E the matrix of all ones can be rewritten R^=P^TQ^by adding one row of all c into P0 and adding one row of all one into Q0. The number of rows in P^ or Q^ is

Sensitive analysis

Assume that the rating matrix R has the optimal rank k approximate R. Let E=RR denote the error matrix of the approximation. It is clear that the optimization problem (8) with the inducible regularization will give solution as good as R in the ideal case when R^ is a good estimation as R. In this section, we give an error analysis for the optimal solution of (8) for K^=Kc in terms of the prediction error R^RF. Practically, we need only the entries of E^=R^R in the set K^. Thus, the

Numerical experiments

In this section, we demonstrate several numerical properties of the inducible regularization for low-rank factorization. We will apply the algorithms IALS and ISGD to three real-world data sets MovieLens, Jester, and EachMovie. IALS and ISGD will be compared with the alternating iteration method ALS and the stochastic algorithm SGD with the original regularization models (2), (3). We will show how the inducible models improve the convergence and enhance the convergent capabilities of the

Conclusion

In this paper, we proposed a novel regularization model of low-rank matrix factorization using an inducible pre-estimation to a set of unknown ratings for the problem of collaborative filtering. We develop two algorithms, alternating iteration algorithm IALS and gradient descent algorithm ISGD, to solve the resulting optimization problems. Our inducible regularization model can meliorate the over-fitting of the alternating iteration algorithm ALS solving the low-rank factorization using the

Zhenyue Zhang received his B.S. degree in mathematics from Fudan University, Shanghai, China, in 1982, and his Ph.D. in scientific computing from Fudan University in 1989. Zhang was an assistant professor of the Department of Mathematics, Fudan University from 1982 to 1985, and is a full professor of the Department of Mathematics, Zhejiang University since 1998. Zhang's current research interests include machine learning and its applications, numerical linear algebra, and recommendation systems.

References (25)

  • K. Zhao et al.

    Successively alternate least square for low-rank matrix factorization with bounded missing data

    Comput. Vision Image Understanding

    (2010)
  • G. Adomavicius et al.

    Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

    IEEE Trans. Knowl. Data Eng. Arch.

    (2005)
  • R. Bell, Y. Koren, Scalable collaborative filtering with jointly derived neighborhood interpolation weights, in: IEEE...
  • A. Das, M. Datar, A. Garg, S. Rajaram, Google news personalization: Scalable online collaborative filtering, in:...
  • Y. Hu, Y. Koren, C. Volinsky, Collaborative filtering for implicit feedback datasets, in: ICDM '08: Proceedings of the...
  • H. Kim et al.

    Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method

    SIAM J. Matrix Anal. Appl.

    (2008)
  • Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: KDD '08: Proceeding...
  • Y. Koren et al.

    Matrix factorization techniques for recommender systems

    Computer

    (2009)
  • M. Kurucz, A.A. Benczur, K. Csalogany, Methods for large scale SVD with missing values, in: Proceedings of KDD Cup and...
  • G. Linden et al.

    Amazon.com recommendations: item-to-item collaborative filtering

    IEEE Internet Comput.

    (2003)
  • B. Marlin, Collaborative Filtering: A Machine Learning Perspective, Masters thesis, University of Toronto, Computer...
  • R. Mazumder et al.

    Spectral regularization algorithms for learning large incomplete matrices

    J. Mach Learn Res.

    (2010)
  • Cited by (0)

    Zhenyue Zhang received his B.S. degree in mathematics from Fudan University, Shanghai, China, in 1982, and his Ph.D. in scientific computing from Fudan University in 1989. Zhang was an assistant professor of the Department of Mathematics, Fudan University from 1982 to 1985, and is a full professor of the Department of Mathematics, Zhejiang University since 1998. Zhang's current research interests include machine learning and its applications, numerical linear algebra, and recommendation systems.

    Keke Zhao received the B.S. degree from Department of Mathematics, Xiamen University, China, in 2007, and now he is a Ph.D. student at Zhejiang University under. Zhao's research interests include machine learning, recommendation systems and numerical optimization.

    Hongyuan Zha received his B.S. degree in mathematics from Fudan University in Shanghai in 1984, and his Ph.D. in scientific computing from Stanford University in 1993. Zha was a faculty member of the Department of Computer Science and Engineering at Pennsylvania State University from 1992 to 2006, and he worked from 1999 to 2001 at Inktomi Corporation. He is now a professor of School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology. Zha's current research interests include web search, recommendation systems, and machine learning applications.

    1

    The work of this author was supported in part by NSFC project 11071218, 10911120395, and National Basic Research Program of China (973 Program) 2009CB320804.

    2

    The work of this author was supported by NSF grant IIS-1116886 and grant from Yahoo!.

    View full text