Skip to main content
Log in

Bayesian Inference via Variational Approximation for Collaborative Filtering

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Variational approximation method finds wide applicability in approximating difficult-to-compute probability distributions, a problem that is especially important in Bayesian inference to estimate posterior distributions. Latent factor model is a classical model-based collaborative filtering approach that explains the user-item association by characterizing both items and users on latent factors inferred from rating patterns. Due to the sparsity of the rating matrix, the latent factor model usually encounters the overfitting problem in practice. In order to avoid overfitting, it is necessary to use additional techniques such as regularizing the model parameters or adding Bayesian priors on parameters. In this paper, two generative processes of ratings are formulated by probabilistic graphical models with corresponding latent factors, respectively. The full Bayesian frameworks of such graphical models are proposed as well as the variational inference approaches for the parameter estimation. The experimental results show the superior performance of the proposed Bayesian approaches compared with the classical regularized matrix factorization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ricci F, Rokach L, Shapira B (2004) Introduction to recommender systems handbook. ACM, New York

    MATH  Google Scholar 

  2. Dietmar Jannach et al (2010) Recommender systems: an introduction. Cambridge University Press, Cambridge

    Google Scholar 

  3. Wei S, Zhao Y, Zhu Z, Liu N (2010) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 22(8):1191–1199

    Article  Google Scholar 

  4. Hofmann T (2004) Latent semantic models for collaborative filtering. ACM, New York

    Book  Google Scholar 

  5. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Hindawi Publishing Corp., Cairo

    Book  Google Scholar 

  6. Candes EJ, Recht B (2009) Exact matrix completion via convex optimization. Commun ACM 9(6):717

    MathSciNet  MATH  Google Scholar 

  7. Candes EJ, Plan Y (2009) Matrix completion with noise. Proc IEEE 98(6):925–936

    Article  Google Scholar 

  8. Goldberg D, Nichols D, Oki BM et al (1992) Using collaborative filtering to weave an information tapestry. Commun ACM 35(12):61–70

    Article  Google Scholar 

  9. Resnick P, Iacovou N, Suchak M et al (1994) GroupLens: an open architecture for collaborative filtering of netnews. In: ACM conference on computer supported cooperative work. ACM, pp 175–186

  10. Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 426–434

  11. Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80

    Article  Google Scholar 

  12. Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14(5):403–420

    Article  MathSciNet  MATH  Google Scholar 

  13. Srebro N, Rennie JDM, Jaakkola T (2004) Maximum-margin matrix factorization. Adv Neural Inf Process Syst 37(2):1329–1336

    Google Scholar 

  14. Paterek A (2007) Improving regularized singular value decomposition for collaborative filtering. In: Proceedings of Kdd cup workshop, pp 5–8

  15. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37

    Article  Google Scholar 

  16. Zhu Y, Shen X, Ye C (2016) Personalized prediction and sparsity pursuit in latent factor models. J Am Stat Assoc 111(513):241–252

    Article  MathSciNet  Google Scholar 

  17. Lim Y J, Teh Y W (2007) Variational Bayesian approach to movie rating prediction. In: Proceedings of Kdd cup and workshop, pp 15–21

  18. Li J, Tian Y, Huang T (2014) Visual saliency with statistical priors. Int J Comput Vis 107(3):239–253

    Article  MathSciNet  MATH  Google Scholar 

  19. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, New York

    MATH  Google Scholar 

  20. Salakhutdinov R, Mnih A (2007) Probabilistic matrix factorization. In: International conference on neural information processing systems, pp 1257–1264

  21. Blei DM, Kucukelbir A, Mcauliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859–877

    Article  MathSciNet  Google Scholar 

  22. Hoffman MD, Blei DM, Wang C et al (2013) Stochastic variational inference. Comput Sci 14(1):1303–1347

    MathSciNet  MATH  Google Scholar 

  23. Berger JO (2002) Statistical decision theory and Bayesian analysis. Springer, New York

    Google Scholar 

  24. Beal MJ (2003) Variational algorithms for approximate Bayesian inference. University College London, London

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Nos. 61203219, 61472335), Natural Science Foundation of Fujian Province of China (No. 2018H0035), Natural Science Foundation of Xiamen City of China (No. 3502Z20183011), and Fujian Shine Technology Limited Company.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenxing Hong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 1

Noting that the ELBO can be written as:

$$\begin{aligned} ELBO\left( Q\right)&=E_{Q\left( A\right) ,Q\left( B\right) ,w}[\log P\left( A,B,w,R\right) ]-E_{Q\left( A\right) ,Q\left( B\right) ,w}[\log Q\left( A,B,w\right) ]\\&=E_{Q\left( A,B,w\right) }\left[ -\frac{1}{2}\sum _{i=1}^M\sum _{k=1}^K \left( \log \left( 2\pi {\sigma _k}^2\right) +\frac{a_{ik}^2}{{\sigma _k}^2}\right) -\frac{1}{2}\sum _{j=1}^N\sum _{k=1}^K \left( \log \left( 2\pi {\rho _k}^2\right) +\frac{b_{jk}^2}{{\rho _k}^2}\right) \right. \\&\quad -\left. \frac{1}{2}\sum _{l=1}^{L_w}\left( \log 2\pi +w_l^2\right) -\frac{1}{2}\sum _{\left( i,j\right) \in \varOmega }\left( \log \left( 2\pi \tau ^2\right) + \frac{\left( r_{ij}-\hat{r}_{ij}\right) ^2}{\tau ^2}\right) \right]&\\&\quad -E_{Q\left( A\right) }\left( \log Q\left( A\right) \right) -E_{Q\left( B\right) }\left( \log Q\left( B\right) \right) -E_{Q\left( w\right) }\left( \log Q\left( w\right) \right) \\&=-\frac{M}{2}\sum _{k=1}^K\log \left( 2\pi {\sigma _k}^2\right) -\frac{N}{2}\sum _{k=1}^K\log \left( 2\pi {\rho _k}^2\right) -\frac{{L_w}\log \left( 2\pi \right) }{2} -\frac{|\varOmega |}{2}\log \left( 2\pi \tau ^2\right) \\&\quad -\frac{1}{2}\sum _{k=1}^K\left( \frac{\sum _{i=1}^ME_{Q\left( A\right) }\left( a_{ik}^2\right) }{{\sigma _k}^2} +\frac{\sum _{j=1}^NE_{Q\left( B\right) }\left( b_{jk}^2\right) }{{\rho _k}^2}\right) -\frac{1}{2}\sum _l^{L_w}E_{Q\left( w\right) }\left( w_l^2\right) \\&\quad -\frac{1}{2}\sum _{\left( i,j\right) \in \varOmega }\frac{E_{Q\left( A\right) Q\left( B\right) Q\left( w\right) }\left( r_{ij}-\hat{r}_{ij}\right) ^2}{\tau ^2}\\&\quad -E_{Q\left( A\right) }\left( \log Q\left( A\right) \right) -E_{Q\left( B\right) }\left( \log Q\left( B\right) \right) -E_{Q\left( w\right) }\left( \log Q\left( w\right) \right) ) \end{aligned}$$

To achieve the optimal Q(A), we can maximize ELBO by fixing \(Q(B),Q(\alpha )\) and \(Q(\beta )\). This gives,

$$\begin{aligned} logQ\left( A\right)&=E_{Q\left( B\right) Q\left( w\right) }[\log p\left( R,A,B,w\right) ] \propto E_{Q\left( B\right) Q\left( w\right) }[\log p\left( R|A,B,w\right) +logp\left( A\right) ]\\&=-\frac{1}{2}\sum _{k=1}^K\sum _{i=1}^M\frac{a_{ik}^2}{{\sigma _k}^2} - \frac{1}{2}\sum _{\left( i,j\right) \in \varOmega } \frac{E_{Q\left( B\right) Q\left( w\right) }\left( r_{ij}-a_i^Tb_j-l\left( w\right) \right) ^2}{\tau ^2}\\&\quad \propto -\frac{1}{2}\sum _{i=1}^M a_i^T\varLambda _1 a_i+ \sum _{j\in N\left( i\right) } \frac{-2a_i^TE_{Q\left( B\right) }\left( b_j\right) E_{Q\left( w\right) } \left( r_{ij}-l\left( w\right) \right) +a_i^TE\left( b_jb_j^T\right) a_i}{\tau ^2}\\&\quad \propto -\frac{1}{2}\sum _{i=1}^Ma_i^T\varLambda _1 a_i+\sum _{j\in N\left( i\right) }a_i^T\left( \varPsi _j+{\bar{b}}_j {\bar{b}}_j^T\right) a_i\\&\qquad -2a_i^T\sum _{j\in N\left( i\right) } \frac{{\bar{b}}_j(r_{ij}-l\left( \bar{w}\right) }{\tau ^2} =-\frac{1}{2}\sum _{i=1}^M \left( a_i-{\bar{a}}_i\right) ^T\varPhi _i^{-1}\left( a_i-{\bar{a}}_i\right) \end{aligned}$$

Thus, Q(A) is given :

$$\begin{aligned}&Q\left( A\right) \propto \prod _{j=1}^M \exp \left( -\frac{1}{2}\left( a_i-{\bar{a}}_i\right) ^T\varPhi _i^{-1}\left( a_i-{\bar{a}}_i \right) \right) \\&\varLambda _1=\begin{pmatrix} \frac{1}{\sigma _1^2}&{} &{}0\\ &{} \ddots &{}\\ 0&{}&{}\frac{1}{\sigma _K^2} \end{pmatrix}, \varPhi _i=\left( \varLambda _1+\sum _{j\in N\left( i\right) } \frac{\varPsi _j+{\bar{b}}_j{\bar{b}}_j^T}{\tau ^2}\right) ^{-1}, \\&{\bar{a}}_i =\varPhi _i\sum _{j\in N\left( i\right) }\frac{{\bar{b}}_j(r_{ij}-l\left( \bar{w}\right) )}{\tau ^2}, \end{aligned}$$

where N(i) is the set of j’s such that \(r_{ij}\) is observed. \(\varPhi _i\) and \(\bar{a_i}\) are the covariance and the mean of \(a_i\) respectively. \(\varPsi _j\) and \(\bar{b_j}\) are the covariance and the mean of \(b_j\)respectively. \(\bar{w}\) is the mean of w. Similarly, the optimal Q(B) is gained by the same method.

$$\begin{aligned} logQ(B)&=E_{Q(A)Q(w)}[\log p(R,A,B,w)] \\&\quad \propto -\frac{1}{2}\sum _{k=1}^K\sum _{j=1}^N\frac{b_{jk}^2}{{\rho _k}^2} - \frac{1}{2}\sum _{(i,j)\in \varOmega } \frac{E_{Q(A)Q(w)}(r_{ij}-a_i^Tb_j-l(w))^2}{\tau ^2}\\&\quad \propto -\frac{1}{2}\sum _{j=1}^Nb_j^T\varLambda _2 b_j+\sum _{i\in N(j)}b_j^T(\varPhi _i+{\bar{a}}_i {\bar{a}}_i^T)b_j-2b_j^T\sum _{i\in N(j)} \frac{{\bar{a}}_i(r_{ij}-l(\bar{w}) }{\tau ^2} \\&=-\frac{1}{2}\sum _{j=1}^N (b_j-{\bar{b}}_j)^T\varPsi _j^{-1}(b_j-{\bar{b}}_j) \end{aligned}$$

Thus, Q(B) is gained:

$$\begin{aligned}&Q\left( B\right) \propto \prod _{j=1}^N\exp \left( -\frac{1}{2}\left( b_j-{\bar{b}}_j\right) ^T\varPsi ^{-1}\left( b_j-{\bar{b}}_j\right) \right) \\&\varLambda _2=\begin{pmatrix} \frac{1}{\rho _1^2}&{} &{}0\\ &{} \ddots &{}\\ 0&{}&{}\frac{1}{\rho _K^2} \end{pmatrix}, \varPsi _j=\left( \varLambda _2+\sum _{i\in N\left( j\right) } \frac{\varPhi _i+{\bar{a}}_i{\bar{a}}_i^T}{\tau ^2}\right) ^{-1},\\&{\bar{b}}_j =\varPsi _j\sum _{i\in N\left( j\right) }\frac{{\bar{a}}_j (r_{ij}-l\left( w\right) }{\tau ^2} \end{aligned}$$

Assume the linear function \(l(w)=x^Tw\), where x represents the known sample.

$$\begin{aligned} \log Q\left( w\right)&=E_{Q\left( A\right) Q\left( B\right) }[\log p\left( R,A,B,w\right) ] \propto E_{Q\left( A\right) Q\left( B\right) }[\log p\left( R|A,B,w\right) +\log p\left( w\right) ]\\&=E_{Q\left( A\right) Q\left( B\right) }[-\frac{1}{2}\sum _{\left( i,j\right) \in \varOmega }\frac{\left( r_{ij}-a_i^Tb_j-x^Tw\right) ^2}{\tau ^2}-\frac{1}{2}\sum _{l=1}^{L_w}w_l^2]\\&\quad \propto -\frac{1}{2}\sum _{\left( i,j\right) \in \varOmega }E_{Q\left( A\right) ,Q\left( B\right) }\frac{2x^Tw\left( r_{ij}-a_ib_j\right) +w^Txx^Tw}{\tau ^2}-\frac{1}{2}w^Tw\\&=-\frac{1}{2}w^T\varDelta w-\frac{1}{2}\sum _{\left( i,j\right) \in \varOmega }\frac{2x^Tw\left( r_{ij}-{\bar{a}}_i{\bar{b}}_j\right) +w^Txx^Tw}{\tau ^2}\\&=-\frac{1}{2}\sum _{l=1}^{L_w} \left( w_l-\bar{w}_l\right) ^T\varDelta ^{-1}\left( w_l-\bar{w}_l\right) Q\left( w\right) \propto \exp -\left( -\frac{1}{2} \left( w -\bar{w}\right) ^T\varDelta ^{-1}\left( w-\bar{w}\right) \right) \\&\varDelta =\left( I+\sum _{\left( i,j\right) \in \varOmega }\frac{xx^T}{\tau ^2}\right) ^{-1}, \bar{w}=\varDelta \sum _{\left( i,j\right) \in \varOmega }\frac{x^T\left( r_{ij}-{\bar{a}}_i^T{\bar{b}}_j\right) }{\tau ^2} \end{aligned}$$

Therefore, the local optimal \(Q(A,B,w)=Q(A)Q(B)Q(w)\) is given. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weng, Y., Wu, L. & Hong, W. Bayesian Inference via Variational Approximation for Collaborative Filtering. Neural Process Lett 49, 1041–1054 (2019). https://doi.org/10.1007/s11063-018-9841-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-018-9841-5

Keywords

Navigation