Link prediction in heterogeneous data via generalized coupled tensor factorization

Ermiş, Beyza; Acar, Evrim; Cemgil, A. Taylan

doi:10.1007/s10618-013-0341-y

Link prediction in heterogeneous data via generalized coupled tensor factorization

Published: 24 December 2013

Volume 29, pages 203–236, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Beyza Ermiş¹,
Evrim Acar² &
A. Taylan Cemgil¹

3294 Accesses
96 Citations
1 Altmetric
Explore all metrics

Abstract

This study deals with missing link prediction, the problem of predicting the existence of missing connections between entities of interest. We approach the problem as filling in missing entries in a relational dataset represented by several matrices and multiway arrays, that will be simply called tensors. Consequently, we address the link prediction problem by data fusion formulated as simultaneous factorization of several observation tensors where latent factors are shared among each observation. Previous studies on joint factorization of such heterogeneous datasets have focused on a single loss function (mainly squared Euclidean distance or Kullback–Leibler-divergence) and specific tensor factorization models (CANDECOMP/PARAFAC and/or Tucker). However, in this paper, we study various alternative tensor models as well as loss functions including the ones already studied in the literature using the generalized coupled tensor factorization framework. Through extensive experiments on two real-world datasets, we demonstrate that (i) joint analysis of data from multiple sources via coupled factorization significantly improves the link prediction performance, (ii) selection of a suitable loss function and a tensor factorization model is crucial for accurate missing link prediction and loss functions that have not been studied for link prediction before may outperform the commonly-used loss functions, (iii) joint factorization of datasets can handle difficult cases, such as the cold start problem that arises when a new entity enters the dataset, and (iv) our approach is scalable to large-scale data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Relational Data with Graph Convolutional Networks

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Advances in Collaborative Filtering

Notes

Some of the listed studies do not impose nonnegativity constraints on the factor matrices while GCTF assumes that all factor matrices are nonnegative.
Table 1 Related studies on coupled factorization of heterogenous data
Full size table
http://www.cse.ust.hk/~vincentz/aaai10.uclaf.data.mat.
http://www.public.esu.edu/~ylin56/kdd09sup.html.

References

Acar E, Kolda TG, Dunlavy DM (2011a) All-at-once optimization for coupled matrix and tensor factorizations. In: KDD’11 workshop proceedings
Acar E, Dunlavy D, Kolda TG, Morten M (2011b) Scalable tensor factorizations for incomplete data. Chemometr Intell Lab 106:41–56
Article Google Scholar
Al Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics. Springer, New York
Google Scholar
Alter O, Brown PO, Botstein D (2003) Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc Natl Acad Sci USA 100:3351–3356
Article Google Scholar
Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. In: SDM’07, pp 145–156
Candès EJ, Plan Y (2010) Matrix completion with noise. Proc IEEE 98:925–936
Article Google Scholar
Cao B, Liu NN, Yang Q (2010) Transfer learning for collective link prediction in multiple heterogenous domains. In: ICML’10, pp 159–166
Carroll JD, Chang JJ (1970) Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 35:283–319
Article MATH Google Scholar
Choudhury MD, Sundaram H, John A, Seligmann DD (2009) Social synchrony: predicting mimicry of user actions in online social media. In: CSE, vol 4, pp 151–158
Cichocki A, Zdunek R, Phan AH, Amari S (2009) Nonnegative matrix and tensor factorization. Wiley, Chichester
Book Google Scholar
Clauset A, Moore C, Newman M (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453:98–101
Article Google Scholar
Davis DA, Lichtenwalter R, Chawla NV (2011) Multi-relational link prediction in heterogeneous information networks. In: ASONAM’11, pp 281–288
Dunlavy DM, Kolda TG, Acar E (2011) Temporal link prediction using matrix and tensor factorizations. In: ACM TKDD’11, vol 5, Issue 2, Article 10
Ermis B, Cemgil AT (2013) A Bayesian tensor factorization model via variational inference for link prediction. In: NIPS 2013 workshop on probabilistic models for big data (PMBD)
Ermis B, Acar E, Cemgil TA (2012) Link prediction via generalized coupled tensor factorisation. In: ECML/PKDD workshop on collective learning and inference on structured data
Gandy S, Recht B, Yamada I (2011) Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl 27:025010
Article MathSciNet Google Scholar
Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newsl 7(2):3–12
Article Google Scholar
Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Work Pap Phonetics 16:1–84
Google Scholar
Harshman RA, Lundy ME (1996) Uniqueness proof for a family of models sharing features of Tucker’s three-mode factor analysis and PARAFAC/candecomp. Psychometrika 61(1):133–154
Article MATH MathSciNet Google Scholar
Hitchcock FL (1927) Multiple invariants and generalized rank of a p-way matrix or tensor. J Math Phys 7:39–79
MATH Google Scholar
Jamali M, Lakshmanan L (2013) HeteroMF: recommendation in heterogeneous information networks using context dependent factor models. In: Proceedings of the 22nd international conference on World Wide Web, WWW ’13, pp 643–654
Jiang M, Cui P, Liu R, Yang Q, Wang F, Zhu W, Yang S (2012) Social contextual recommendation. In: CIKM’12, pp 45–54
Kaas R (2005) Compound Poisson distributions and GLM’s, Tweedie’s distribution. Technical report. Royal Flemish Academy of Belgium for Science and the Arts, Brussels
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37
Article Google Scholar
Lin Y-R, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) MetaFac: community discovery via relational hypergraph factorization. In: KDD’09, pp 527–536
Long B, Zhang (Mark) Z, Wu X, Yu PS (2006) Spectral clustering for multi-type relational data. In: ICML’06, pp 585–592
Ma H, Yang H, Lyu MR, King I (2008) Sorec: social recommendation using probabilistic matrix factorization. In: CIKM’08
Menon AK, Elkan C (2011) Link prediction via matrix factorization. In: ECML/PKDD’11, pp 437–452
Menon AK, Chitrapura KP, Garg S, Agarwal D, Kota N (2011) Response prediction using collaborative filtering with hierarchies and side-information. In: KDD’11, pp 141–149
Narita A, Hayashi K, Tomioka R, Kashima H (2011) Tensor factorization using auxiliary information. In: ECML PKDD’11, pp 501–516
Popescul A, Ungar LH (2003) Statistical relational learning for link prediction. In: IJCAI’03
Sanderson M (2010) Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4(4):247–375
Article MATH Google Scholar
Shi C, Kong X, Yu PS, Xie S, Wu B (2012) Relevance search in heterogeneous networks. In: EDBT. ACM, New York, NY, pp 180–191
Simsekli U, Cemgil AT (2012) Markov chain Monte Carlo inference for probabilistic latent tensor factorization. In: IEEE international workshop on machine learning for signal processing (MLSP)
Simsekli U, Cemgil AT, Yilmaz YK (2013a) Learning the beta-divergence in Tweedie compound Poisson matrix factorization models. In: Proceedings of the 30th international conference on machine learning (ICML-13), JMLR workshop and conference proceedings, May 2013, vol 28, pp 1409–1417
Şimşekli U, Ermiş B, Cemgil AT, Acar E (2013) Optimal weight learning for coupled tensor factorization with mixed divergences. In: EUSIPCO
Singh AP, Gordon GJ (2008) Relational learning via collective matrix factorization. In: KDD’08
Smilde AK, Westerhuis JA, Boque R (2000) Multiway multiblock component and covariates regression models. J Chemom 14:301–331
Article Google Scholar
Spiegel S, Clausen JH, Albayrak S, Kunegis J (2011) Link prediction on evolving data using tensor factorization. In: PAKDD workshops, pp 100–110
Stäger M, Lukowicz P, Tröster G (2006) Dealing with class skew in context recognition. In: ICDCS workshops, p 58
Sun Y, Barber R, Gupta M, Aggarwal CC, Han J (2011) Co-author relationship prediction in heterogeneous bibliographic networks. In: ASONAM, pp 121–128
Tan VYF, Fevotte C (2013) Automatic relevance determination in nonnegative matrix factorization with the beta-divergence. IEEE Trans Pattern Anal Mach Intell 35(7):1592–1605
Google Scholar
Taskar B, Wong M-F, Abbeel P, Koller D (2003) Link prediction in relational data. In: NIPS’03
Tucker LR (1963) Implications of factor analysis of three-way matrices for measurement of change. In: Harris CW (ed) Problems in measuring change. University of Wisconsin Press, Madison, pp 122– 137
Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31:279– 311
Google Scholar
Wang C, Raina R, Fong D, Zhou D, Han J, Badros GJ (2011) Learning relevance from heterogeneous social network and its application in online targeting. In: SIGIR. ACM, New York, NY, pp 655–664
Yang S-H, Long B, Smola AJ, Sadagopan N, Zheng Z, Zha H (2011) Like like alike: joint friendship and interest propagation in social networks. In: WWW’11, pp 537–546
Yang Y, Chawla NV, Sun Y, Han J (2012) Predicting links in multi-relational and heterogeneous networks. In: ICDM’12, pp 755–764
Yilmaz YK (2012) Generalized tensor factorization. PhD Thesis, Bogazici University
Yilmaz YK, Cemgil AT (2010) Probabilistic latent tensor factorization. In: LVA/ICA, pp 346–353
Yılmaz YK, Cemgil AT (2012) Alpha/beta divergences and Tweedie models. arXiv: 1209.4280 v1
Yilmaz YK, Cemgil AT, Simsekli U (2011) Generalised coupled tensor factorisation. In: NIPS’11
Yoo J, Choi S (2012) Hierarchical variational Bayesian matrix co-factorization. In: ICASSP’12, pp 1901–1904
Yoo J, Kim M, Kang K, Choi S (2010) Nonnegative matrix partial co-factorization for drum source separation. In: ICASSP’10, pp 1942–1945
Yu X, Gu Q, Zhou M, Han J (2012) Citation prediction in heterogeneous bibliographic networks. In: SDM. SIAM/Omnipress, Anaheim, CA, pp 1119–1130
Zheng VW, Cao B, Zheng Y, Xie X, Yang Q (2010) Collaborative filtering meets mobile recommendation: a user-centered approach. In: AAAI’10
Zheng VW, Zheng Y, Xie X, Yang Q (2012) Towards mobile intelligence: learning from GPS history data for collaborative recommendation. Artif Intell 184–185:17–37
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is funded by the TUBITAK Grant Number 110E292, Bayesian matrix and tensor factorisations (BAYTEN) and Boğaziçi University Research Fund BAP5723. It is also funded in part by the Danish Council for Independent Research—Technology and Production Sciences and Sapere Aude Program under the Projects 11-116328 and 11-120947.

Author information

Authors and Affiliations

Department of Computer Science, Boğaziçi University, Bebek, 34342 , Istanbul, Turkey
Beyza Ermiş & A. Taylan Cemgil
Faculty of Life Sciences, University of Copenhagen, 1958 , Frederiksberg C, Denmark
Evrim Acar

Authors

Beyza Ermiş
View author publications
You can also search for this author in PubMed Google Scholar
Evrim Acar
View author publications
You can also search for this author in PubMed Google Scholar
A. Taylan Cemgil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beyza Ermiş.

Additional information

Responsible editor: Jian Pei.

Appendix

1.1 Computation for common factors

Here, we show the computation for A:

$$\begin{aligned} \varDelta _{A,1}(Q)&= \left[ \sum \limits _{j,k} Q^{i,j,k} \left( B^{j,r} C^{k,r}\right) \right] = Q_1(BC), \\ \varDelta _{A,2}(Q)&= \left[ \sum \limits _{m} Q^{i,m} \left( D^{m,r}\right) \right] = Q_2 D, \\&A \leftarrow A \circ \frac{Q_1(BC) + Q_2 D}{\hat{X}_1^{-p}(BC) + \hat{X}_2^{-p} D}, \end{aligned}$$

and $B$:

$$\begin{aligned} \varDelta _{B,1}(Q)&= \left[ \sum \limits _{i,k} Q^{i,j,k} \left( A^{i,r} C^{k,r}\right) \right] = Q_1(AC), \\ \varDelta _{B,2}(Q)&= \left[ \sum \limits _{n} Q^{j,n}\left( E^{n,r}\right) \right] = Q_2 E, \\&B \leftarrow B \circ \frac{Q_1(AC) + Q_3 E}{\hat{X}_1^{-p}(AC) + \hat{X}_3^{-p} E}, \end{aligned}$$

given in Model 1, Sect. 4.1.

1.2 Computational complexity

We have conducted experiments on tensor completion problem to demonstrate that time complexity of the modeling framework is $O(N)$ for sparse datasets, where N is the number of known entries. We consider two situations in these experiments: (i) $500 \times 500 \times 500$ three-way array with 99 % missing data (1.25 million known values), and (ii) $1,000 \times 1,000 \times 1,000$ three-way array with 98 % missing data (20 million known values). We have used CP tensor factorization model with R = 3 components to generate data, then added 20 % random Gaussian noise. We have then fitted a CP model using EUC distance-based loss function and used the extracted CP factors to reconstruct the data. Figure 17 shows the average tensor completion performance of 10 independent runs in terms of RMSE score. In the $500 \times 500 \times 500$ case, all ten problems have been solved with an RMSE score around 0.20, with computation times ranging between 400 and 500 s and in the $1,000 \times 1,000 \times 1,000$ case, all ten problems are also solved with an RMSE score around 0.20. The computation times have ranged from 8,000 to 12,000 s, approximately 20 times slower than the $500 \times 500 \times 500$ case, which has 16 times more non-missing entries.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ermiş, B., Acar, E. & Cemgil, A.T. Link prediction in heterogeneous data via generalized coupled tensor factorization. Data Min Knowl Disc 29, 203–236 (2015). https://doi.org/10.1007/s10618-013-0341-y

Download citation

Received: 29 December 2012
Accepted: 02 December 2013
Published: 24 December 2013
Issue Date: January 2015
DOI: https://doi.org/10.1007/s10618-013-0341-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Link prediction in heterogeneous data via generalized coupled tensor factorization

Abstract

Access this article

Similar content being viewed by others

Modeling Relational Data with Graph Convolutional Networks

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Advances in Collaborative Filtering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Computation for common factors

1.2 Computational complexity

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Link prediction in heterogeneous data via generalized coupled tensor factorization

Abstract

Access this article

Similar content being viewed by others

Modeling Relational Data with Graph Convolutional Networks

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Advances in Collaborative Filtering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Computation for common factors

1.2 Computational complexity

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation