Keywords

1 Introduction

With the coming of digital era, recommender systems have gained more and more attention and become one of the most effective tool to deal with “information overload” problem [1]. Correspondingly, a number of effective techniques and methods for recommendation have been proposed and many have been successfully applied to business systems during the last decade.

Typically, collaborative filtering is the most active in the field of recommendation in recent years, mainly because it is simple and effective and has played an important role in Netflix competition. Underlying the hypothesis that the similar users have the same hobbies and similar items have the same fans [1, 2], the memory-based collaborative filtering algorithms, such as User-KNN (K nearest neighbor algorithm based on user rating similarity) [3,4,5] and Item-KNN (K nearest neighbor algorithm based on item rating similarity) [6, 7], are widely studied and applied in many commercial systems, such as Amazon and eBay. However, these methods suffer from the inherent weakness that the sparsity of the user-item rating matrix (the available ratings of a business system is typically less than 1%) make them fail to find similar users and items. Recently, model-based collaborative filtering algorithms have gained more and more attention mainly due to that they are scalable and have some ability of solving sparsity problem. In model-based algorithms, a predefined model is trained using observed ratings and other available information. The representative model-based algorithms contain Singular Value Decomposition (SVD) [1], SVD++ [1], and Probabilistic Matrix Factorization (PMF) [8] and so on, which achieves some good results, but still cannot perform well when the observed data is sparse.

Recently, many scholars [9,10,11,12,13,14,15] have put forward a lot of fused collaborative filtering algorithms by using the social information of users and structure information of items, and have achieved good results especially when there is a sparsity problem. Ma et al. [9] recognized the prediction quality, scalability and data sparsity problem, and proposed a novel social recommendation algorithm based on PMF, which outperforms several baselines. Lei et al. [14] consider the influence of item relations, and fuse item relations in social recommendation on the basis of Ma’s research. However, how to obtain and measure the social and structure information is the key challenge, which may have a lot of influence on the recommendation results. The current solution mainly includes the following types: (1) Using the explicit social network information for recommendation; (2) Calculating the similarities among users (items) by using implicit label information. But in reality, it is difficult to get the social network information or enough implicit label information of the users (items).

In order to overcome the above mentioned defects, underlying the idea that item similarity and user similarity can affect one’s personal behaviors, and historical behaviors on web can reflect item similarity and user similarity, we proposed a novel two-phase recommender framework named IU-PMF, which abstract item similarity and user similarity attributes from user-item ratings firstly, and then fuse them with PMF model to make more personalized and accurate recommendations. Although this idea of fusing similarity in matrix factorization may not be very new, to the best of our knowledge, IU-PMF is most concise, and the complexity analysis shows that IU-PMF is more scalable than other state-of-the-art similarity fused models, such as NHPMF [16] and SBMF [17]. Additionally, the experiments on three real-world datasets shows that IU-PMF is outstanding in terms of RMSE.

The remainder of the paper is organized as follows. In the next section we describe the related work. The IU-PMF model and complexity analysis are presented in Sect. 3. Section 4 displays and analyses the experimental result. Finally the conclusion and future work have been presented in Sect. 5.

2 Related Work

In this section, we explore the latest development of the relevant research content. Specifically, PMF, several latest MF based recommendation algorithms and other state-of-the-art similarity fused models are displayed.

Based on ordinary matrix factorization model, Mnih et al. [8] presented PMF, which assumes that the user-item rating matrix, the derived user latent matrix and item latent matrix obey Gaussian distribution, and combines the three organically with the Bayesian principle. Their experimental results demonstrate that PMF performs well on Netflix that is sparse, large and very imbalanced. While many studies have demonstrated that PMF alone is poor to reveal local relationships [1].

In these years, in order to further promote the performance, PMF is widely studied and used for social recommendation and context-aware recommendation. By employing both user-item rating records and users’ social network information, Ma et al. [9] proposed a novel social recommender approach based on PMF named SoRec. In SoRec, they introduced latent factor feature matrix, and combined it with the latent user feature matrix and latent item feature matrix in a single Bayesian equation, which helped to improve prediction accuracy and solve the data sparsity problem, especially when users have few or no ratings. Based on the research content of Lei et al. [14] presented PMFUI model incorporating Item Relations to Social Recommendation based on the idea that related items are more likely to be enjoyed by the same user. Specifically, PMFUI considers the influence of user connections and item relations simultaneously and utilizes the shared latent feature space to constrain the objective function. And their experimental results show that their method outperforms SoRec on 2012 KDD Cup dataset. Sun et al. [18] noticed the sequential correlations among users and items, and proposed a PMF based method to capture the sequential behaviors of items and users, which can help find neighbors that are most influential to the given items (users). Furthermore, the method successfully combined the recommendation process with the influential neighbors based on PMF, and achieved good results.

Although the methods mentioned above have achieved good results, most of them require additional social information or implicit label information, which are usually not available in reality. To overcome the weakness, recent researches have focused on similarity fusion, which combines the advantages of neighborhood-aware methods and matrix factorization model. Wu et al. [16] proposed NHPMF, a novel two-stage recommendation model, which firstly uses tagging data to select neighbors of each users and items, and then incorporate them into the factorization. Although the proposed can improve rating accuracy a lot when applied to extra tagging data, it performs not well when there only exists rating data. Wang et al. [17] suggested SBMF which establishes the clusters of users and items using rating data and then incorporates the cluster information into Matrix factorization, the idea is effective, while the clusters computation phase is time consuming, and thus reduces the scalability. Observed the above defects, this paper mainly simply mines Item similarities and User similarities from User-Item ratings, and then incorporates them into PMF using a concise fused model which have high scalability, to promote the performance of recommendation.

3 IU-PMF

In this section, we first present the notations and description of IU-PMF. Then we introduce the User (Item) similarity matrix computation method. After that, the IU-PMF model is demonstrated in detail. Finally, the complexity analysis is shown.

3.1 Notations and Description of IU-PMF

Suppose we have N users, M items, and integer user-item rating values ranging from 1 to K. To make our model more adaptable, we propose to use the function \( t(x) = (x - 1)/(K - 1) \) to map the ratings to interval [0,1]. Let \( R_{i,j} \) be the rating of user i for item j. Let \( U\;{ \in }\;R^{L \times N} \) and \( V \in R^{L \times M} \) be latent user and item feature matrices, with column vectors \( U_{i} \) and \( V_{j} \) representing user-specific and item-specific latent feature vectors respectively. Let \( D \in R^{N \times N} \) and \( S \in R^{M \times M} \) abstracted from User-Item ratings be user similarity matrix and item similarity matrix respectively. We connect user similarity, item similarity and user-item ratings data in shared user latent feature space and shared item latent feature space in IU-PMF, and the graphical model for IU-PMF is shown in Fig. 1.

Fig. 1.
figure 1

Graphical model for IU-PMF

3.2 User (Item) Similarity

In IU-PMF, we need to model user (item) similarity, and due to that we connect user similarity, item similarity and user-item ratings that we map to interval [0,1] in shared user and item latent feature space, the output of the similarity measure function must lie in [0,1]. In statistics, the similarity of two samples can be measured by linear and non-linear ways, such as Cosine similarity [19], Pearson correlation coefficient (PCC) [20] and Radial basis function (RBF) [21] etc. Among these methods, PCC is the most commonly used for rating evaluation [1], while Cosine similarity is simpler than PCC and it has been stated that cosine similarity is a better measure function for top-N recommendation [1]. Additionally, thus to simplify modeling, we proposed to model user (item) similarity based on PCC and Cosine similarity respectively, and explore their ultimate performance in the experiment section.

The range of similarity measured by PCC or Cosine similarity is [−1, 1], so we use the scale function \( t(x) = (1 + x)/2 \) to scale it to [0, 1]. Although the idea is simple and rough, as we demonstrate in our experimental results section, it brings good performance.

3.3 IU-PMF Model

Similar to PMF, the conditional distribution of IU-PMF over the observed user-item ratings is defined as:

$$ p(R|U,V,\sigma_{R}^{2} ) = \prod\limits_{i = 1}^{N} {\prod\limits_{j = 1}^{M} {N(R_{i,j} |g(U_{i}^{T} V_{j} ),\sigma_{R}^{2} )^{{I_{i,j}^{R} }} } } $$
(1)

where \( N(x|\mu ,\sigma^{2} ) \) is the Gaussian distribution density function with variance \( \sigma^{2} \) and mean \( \mu \), \( g(x) \) is the logistic function and \( g(x) = 1/(1 + \exp ( - x)) \), \( I_{i,j}^{R} \) is the indicator function returning 1 if user i rated item j and 0 otherwise.

The conditional distribution over the user similarity matrix D is defined as:

$$ P(D|U,\sigma_{D}^{2} ) = \prod\limits_{i = 1}^{N} {\prod\limits_{k = 1}^{N} {N(D_{i,k} |g(U_{i}^{T} U_{k} ),\sigma_{D}^{2} )^{{I_{i,k}^{D} }} } } $$
(2)

where \( I_{i,k}^{D} \) is the indicator function returning 1 if the similarity between user i and k can be calculated and 0 otherwise.

Similar to user similarity, the conditional distribution over the item similarity matrix S is defined as:

$$ P(S|V,\sigma_{S}^{2} ) = \prod\limits_{j = 1}^{M} {\prod\limits_{p = 1}^{M} {N(S_{j,p} |g(V_{j}^{T} V_{p} ),\sigma_{S}^{2} )^{{I_{j,p}^{S} }} } } $$
(3)

where \( I_{j,p}^{S} \) is the indicator function returning 1 if the similarity between item j and p can be calculated and 0 otherwise.

We also assume that U and V are subject to spherical Gaussian priori distribution with zero-mean:

$$ \begin{aligned} P(U|\sigma_{U}^{2} ) = \prod\limits_{i = 1}^{N} {N(U_{i} |0,\sigma_{U}^{2} {\text{I}})} \hfill \\ P(V|\sigma_{V}^{2} ) = \prod\limits_{j = 1}^{M} {N(V_{j} |0,\sigma_{V}^{2} {\text{I}})} \hfill \\ \end{aligned} $$
(4)

So, by a simple Bayesian derivation, the posterior distribution function is as follows:

$$ \begin{aligned} & P(U,V|R,D,S,\sigma_{R}^{2} ,\sigma_{D}^{2} ,\sigma_{S}^{2} ,\sigma_{U}^{2} ,\sigma_{V}^{2} ) \\ & \propto P(R|U,V,\sigma_{R}^{2} )*P(D|U,\sigma_{D}^{2} )*P(S|V,\sigma_{S}^{2} )*P(U|\sigma_{U}^{2} )*P(V|\sigma_{V}^{2} ) \\ & = \prod\limits_{i = 1}^{N} {\prod\limits_{j = 1}^{M} {N(R_{i,j} |g(U_{i}^{T} V_{j} ),\sigma_{R}^{2} )^{{I_{i,j}^{R} }} } } \times \prod\limits_{i = 1}^{N} {\prod\limits_{k = 1}^{N} {N(D_{i,k} |g(U_{i}^{T} U_{k} ),\sigma_{D}^{2} )^{{I_{i,k}^{D} }} } } \\ & \quad \times \prod\limits_{j = 1}^{M} {\prod\limits_{p = 1}^{M} {N(S_{j,p} |g(V_{j}^{T} V_{p} ),\sigma_{S}^{2} )^{{I_{j,p}^{S} }} } } \times \prod\limits_{i = 1}^{N} {N(U_{i} |0,\sigma_{U}^{2} {\text{I}})} \times \prod\limits_{j = 1}^{M} {N(V_{j} |0,\sigma_{V}^{2} {\text{I}})} \\ \end{aligned} $$
(5)

And the log of the posterior distribution (Eq. 5) is given by:

$$\begin{aligned} &\ln P(U,V|R,D,S,\sigma_{R}^{2} ,\sigma_{D}^{2} ,\sigma_{S}^{2} ,\sigma_{U}^{2} ,\sigma_{V}^{2} ) \nonumber\\ & = - \frac{1}{{2\sigma_{R}^{2} }}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{M} {I_{i,j}^{R} (R_{i,j} - g(U_{i}^{T} V_{j} ))^{2} } } - \frac{1}{{2\sigma_{D}^{2} }}\sum\limits_{i = 1}^{N} {\sum\limits_{k = 1}^{N} {I_{i,k}^{D} (D_{i,k} - g(U_{i}^{T} U_{k} ))^{2} } } \\ & \quad - \frac{1}{{2\sigma_{S}^{2} }}\sum\limits_{j = 1}^{M} {\sum\limits_{p = 1}^{M} {I_{j,p}^{S} (S_{j,p} - g(V_{j}^{T} V_{p} ))^{2} } } - \frac{1}{{2\sigma_{U}^{2} }}\sum\limits_{i = 1}^{N} {U_{i}^{T} U_{i} } - \frac{1}{{2\sigma_{V}^{2} }}\sum\limits_{j = 1}^{M} {V_{j}^{T} V_{j} } \nonumber\\ & \quad - \frac{1}{2}(MN\,\ln \sigma_{R}^{2} + N^{2} \,\ln \sigma_{D}^{2} + M^{2} \,\ln \sigma_{S}^{2} + ND\,\ln \sigma_{U}^{2} + MD\,\ln \sigma_{V}^{2} ) + C \nonumber \end{aligned} $$
(6)

where C is a constant not depending on any parameters. Maximizing the log-posterior function (Eq. 6) over U and V with parameters kept fixed is equivalent to minimizing the following sum-of-squared-errors objective functions:

$$\begin{aligned} E & = \frac{1}{2}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{M} {I_{i,j}^{R} (R_{i,j} - g(U_{i}^{T} V_{j} ))^{2} } } + \frac{{\sigma_{R}^{2} }}{{2\sigma_{D}^{2} }}\sum\limits_{i = 1}^{N} {\sum\limits_{k = 1}^{N} {I_{i,k}^{D} (D_{i,k} - g(U_{i}^{T} U_{k} ))^{2}}} \\ & \quad + \frac{{\sigma_{R}^{2} }}{{2\sigma_{S}^{2} }}\sum\limits_{j = 1}^{M} {\sum\limits_{p = 1}^{M} {I_{j,p}^{S} (S_{j,p} - g(V_{j}^{T} V_{p} ))^{2} } } + \frac{{\lambda_{U} }}{2}\sum\limits_{i = 1}^{N} {U_{i}^{T} U_{i} } + \frac{{\lambda_{V} }}{2}\sum\limits_{j = 1}^{M} {V_{j}^{T} V_{j} }\nonumber \end{aligned} $$
(7)

where \( \lambda_{U} = \sigma_{R}^{2} /\sigma_{U}^{2} \), \( \lambda_{V} = \sigma_{R}^{2} /\sigma_{V}^{2} \). By performing gradient descent in \( U_{i} \) and \( V_{j} \), a local minimum of the above objective function can be found.

$$ \begin{aligned} \frac{\partial E}{{\partial U_{i} }} & = \sum\limits_{j = 1}^{M} {I_{i,j}^{R} (g(U_{i}^{T} V_{j} ) - R_{i,j} )} g^{\prime}(U_{i}^{T} V_{j} )V_{j} \\ & \quad + \frac{{2\sigma_{R}^{2} }}{{\sigma_{D}^{2} }}\sum\limits_{k = 1}^{N} {I_{i,k}^{D} (g(U_{i}^{T} U_{k} ) - D_{i,k} )g^{\prime}(U_{i}^{T} U_{k} )U_{k} } + \lambda_{U} U_{i} \\ \frac{\partial E}{{\partial V_{j} }} & = \sum\limits_{i = 1}^{N} {I_{i,j}^{R} (g(U_{i}^{T} V_{j} ) - R_{i,j} )} g^{\prime}(U_{i}^{T} V_{j} )U_{i} \\ & \quad + \frac{{2\sigma_{R}^{2} }}{{\sigma_{S}^{2} }}\sum\limits_{p = 1}^{M} {I_{j,p}^{S} (g(V_{j}^{T} V_{p} ) - S_{j,p} )g^{\prime}(V_{j}^{T} V_{p} )V_{p} } + \lambda_{V} V_{j} \\ \end{aligned} $$
(8)

where \( g^{\prime}(x) = \exp (x)/(1 + \exp (x))^{2} \) and it is the derivative of the logistic function. To reduce the model complexity, we set \( \lambda_{U} = \lambda_{V} \) in all the experiments in Sect. 4.

3.4 Complexity Analysis

The IU-PMF model consists of two phases: the Item and User similarity matrices D and S computation phase, and the phase of minimizing the objective function Eq. (7). The first phase doesn’t need to be applied frequently, due to that it is able to receive a good effect as long as 80% of the observations are involved in the phase (means that when a handful of new data arrives, we do not need to calculate the similarity matrices again), which is demonstrated in Subsect. 4.4. Suppose we have T observations, N users and M items, the average number of ratings for each user is \( T/N \), so for every two users, the complexity of computing their similarity is \( O(T/N) \), and the complexity of computing similarity matrix D is \( O(N*N*(T/N)) = O(NT) \). Similarly, the complexity of computing similarity matrix S is \( O(MT) \). The second phase is similar to PMF which scales linearly with the observations. Due to the sparsity of matrices R, D and S, the computational complexity of evaluating the object function \( E \) is \( O(\rho_{R} + \rho_{D} + \rho_{S} ) \), and the computational complexities for gradients \( \frac{\partial E}{\partial U} \) and \( \frac{\partial E}{\partial V} \) are \( O(\rho_{R} + \rho_{D} ) \) and \( O(\rho_{R} + \rho_{S} ) \) respectively, where \( \rho_{R} \), \( \rho_{D} \) and \( \rho_{S} \) are the numbers of nonzero entries in matrices R, D and S, respectively. Therefore, the total computational complexity in one iteration is \( O(\rho_{R} + \rho_{D} + \rho_{S} ) \), which indicates that the computational time of our method is linear with respect to the number of observations. This complexity analysis demonstrates that our proposed approach can be applied efficiently in very large datasets.

4 Experimental Evaluation

In this section, we firstly describe the experimental dataset, and then the effectiveness comparison with other state-of-the-art approaches has been showed. After that, we analyze the influence of parameters on IU-PMF and further explore its ability of solving sparsity problem. Finally we study the effect of similarity matrices on IU-PMF.

4.1 Experimental Datasets

We use three real-world datasets in our experiment. The first dataset is Movielens-1M released in February 2013 by Movie Lens, which consists 1,000,209 ratings from 6,040 anonymous users on 3,952 movie titles. The other two datasets is Book and Music rating datasets crawled from DouBan website, and users and items with less ratings (only three ratings) which make no sense in statistics are deleted, finally we obtain 1,030,701 valid book ratings from 4,705 users on 3,876 book names, and 1,173,540 valid music ratings from 7,924 users on 4,759 music titles. The description of the three datasets are shown in Table 1.

Table 1. The description of the experimental datasets

4.2 Effectiveness Comparison

In order to assess the performance of IU-PMF, we compare the following models. Among them, only NHPMF needs extra tagging data, to ensure fairness, we use the rating data instead of tagging data to select neighbors in NHPMF. If the similarity function is not specified, PCC is default.

  1. 1.

    Item-KNN: this is the regular K nearest neighbor algorithm based on item similarity.

  2. 2.

    User-KNN: this is the regular K nearest neighbor algorithm based on user similarity.

  3. 3.

    PMF: regular probabilistic matrix factorization model which is popular and widely used recently.

  4. 4.

    NHPMF: this is a neighborhood-aware PMF model needing tagging data.

  5. 5.

    SBMF: A novel similarity fusion model based on Matrix Factorization.

  6. 6.

    IU-PMF (CS): our proposed model using cosine similarity.

  7. 7.

    IU-PMF (PCC): our proposed model using PCC to compute similarity.

In the comparison experiment, we spit the datasets into two parts. 90% is for training and the left 10% is for testing, and we use five-fold cross validation on the training set to select optimal parameters. For Item-KNN and User-KNN, we select the optimal neighbor size from {5, 10, 15, 20, 25,…, 100}. We use batch gradient descent to train the MF based methods, in which the learn ratio we set is 0.001 and the regularization parameters are selected from {0.0005, 0.01, 0.05, 0.1, 0.2, 0.4, 0.8, 1.2}. We employ RMSE for assessing the quality of the recommendation when \( L = 10 \) and \( L = 20 \), and Table 2 displays the comparison results.

Table 2. The effectiveness comparison results

From Table 2, we can observe that Item-KNN outperforms User-KNN, and the performance of MF based approaches when \( L = 10 \) is better than when \( L = 20 \) on the three datasets. We also discover that MF based models performs much better than neighborhood-aware methods in terms of RMSE, which further proves the effectiveness of MF assumption. Table 2 also tells us that NHPMF which combines the advantages of neighborhood-aware methods and PMF model, reduces the average RMSE over PMF by 0.0187 when \( L = 10 \) and 0.0169 when \( L = 20 \), which demonstrates that it is effective to incorporate similarity information in PMF model. However, we can find that SBMF outperforms NHPMF in each case, which indicates the effectiveness of integrating local preference information in MF. As for our IU-PMF model, we can find that IU-PMF using PCC obviously reduces RMSE than IU-PMF using cosine similarity in many case. As a conclusion, PCC is more suitable for IU-PMF in our experiments. Additionally, we observe that IU-PMF is able to outperform other state-of-the-art methods obviously. Therefore, we can conclude that the similarity relationship hidden in User-Item ratings has an important influence on the recommendation results, and IU-PMF which concisely maps similarity relationship to latent feature space is more effective than any other baseline models on the three datasets.

4.3 Impacts of Parameters \( \lambda_{U} \) and \( \lambda_{V} \)

In IU-PMF model, parameters \( \lambda_{U} \) and \( \lambda_{V} \) play an important role in countering over-fitting problem. In this subsection, we will explore the impacts of parameters \( \lambda_{U} \) and \( \lambda_{V} \). To simplify the complexity, in the experiment, we set \( \lambda_{U} = \lambda_{V} = \lambda \), which is enumerated from {0.0005, 0.01, 0.05, 0.1, 0.2, 0.4, 0.8, 1.2}, and we only probe the performance in the optimal circumstances (when \( L = 10 \) meanwhile PCC is used according to the experimental results in the last subsection), additionally, the other experimental conditions remain unchanged.

Figure 2 show the impacts of \( \lambda_{U} \) and \( \lambda_{V} \) on RMSE. We observe that the values of \( \lambda_{U} \) and \( \lambda_{V} \) impact the recommendation results significantly. When \( \lambda \) increases with initialization of 0.0005, RMSE gradually decreases (the lower the better). However when they reach a certain threshold, RMSE starts to increase. From Fig. 2 we can also see that IU-PMF’s ability of countering under-fitting is superior to its ability of countering over-fitting (when \( \lambda \) exceed the threshold, the model’s performance goes bad relatively slowly), which may because that the IU-PMF model’s incorporating similarity into rating data improves its learning ability.

Fig. 2.
figure 2

Impacts of parameters \( \lambda_{U} \) and \( \lambda_{V} \)

4.4 The Effect of Similarity Matrices D and S

The main advantage of our IU-PMF model is that it incorporates the similarity information abstracted from User-Item ratings in PMF, which helps the model grasp the hidden information of User-Item ratings. From the complexity analysis (Subsect. 3.4), we know that the computation of Item and User similarity matrices is expensive. In order to probe the scalability of IU-PMF, we conduct the experiment to explore the effect that different proportion of the training data applied in similarity matrices computation phase has on IU-PMF. In the experiment, we select the first 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% and 100% of the training data in chronological order for the calculation of the similarity matrices (we use chronological order in this place because we want verify that when a handful of new data arrives, the similarity matrices computed using historical data do not need to be computed again), and then apply the full training set to train IU-PMF model, finally evaluate the RMSE performance on the testing set. Additionally, the other experimental conditions are same as the experiment in Subsect. 4.3.

From Fig. 3, we can observe that the similarity matrices have a great influence on recommendation results (On average, the more training samples are involved in the calculation of the similarity matrices, the lower the corresponding RMSE is). Additionally, Fig. 3 shows that when there is 80% or above of the training set involved in the similarity matrices computation phase, IU-PMF model is able to receive good effects. We can conclude that similarity matrices don’t need to be computed frequently, and so the efficiency of the IU-PMF model is determined by the training time of the fused PMF model, which scales linearly with the number of observations.

Fig. 3.
figure 3

The effect of similarity matrices on IU-PMF

5 Conclusion and Future Work

In this paper, based on the intuition that item similarity and user similarity can affect the users’ behaviors on the items, we present a novel and concise recommender framework incorporating item similarity and user similarity abstracted from observed user-item ratings into PMF model. The experimental results show that our approach outperforms the state-of-the-art baseline models, and the complexity analysis indicates it is scalable to large datasets.

One of the core part of the IU-PMF model is user (item) similarity computation, which has a great influence on recommendation results. In this paper, we roughly compare PCC with cosine similarity in terms of RMSE, and finally select PCC to compute similarity matrices. Although the proposed similarity computation method achieves good results, we believe it is worthy of further study. Specifically, we could use a kernel function, such as Gaussian Kernel or a Polynomial Kernel, which maps the relations of two vectors into a nonlinear space, and thus would further enhance the nonlinear learning ability of IU-PMF. Additionally, in the form, our model could also be used with similarities computed from content-based features, and so we will conduct further research on content-based features to use our model for content-based recommendation.