Skip to main content
Log in

TensorCast: forecasting and mining with coupled tensors

Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Given an heterogeneous social network, can we forecast its future? Can we predict who will start using a given hashtag on twitter? Can we leverage side information, such as who retweets or follows whom, to improve our membership forecasts? We present TensorCast, a novel method that forecasts time-evolving networks more accurately than current state-of-the-art methods by incorporating multiple data sources in coupled tensors. TensorCast is (a) scalable, being linearithmic on the number of connections; (b) effective, achieving over 20% improved precision on top-1000 forecasts of community members; (c) general, being applicable to data sources with different structure. We run our method on multiple real-world networks, including DBLP, epidemiology data, power grid data, and a Twitter temporal network with over 310 million nonzeros, where we predict the evolution of the activity of the use of political hashtags.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Notes

  1. One of the experiments in Sect. 5 deals with this scenario.

  2. For instance, we know that the second biggest element is one of \(\varvec{a}_{2}\varvec{b}_1\varvec{s}_1\), \(\varvec{a}_{1}\varvec{b}_{2}\varvec{s}_1\) or \(\varvec{a}_{1}\varvec{b}_1\varvec{s}_{2}\).

  3. Remember that \(\varvec{A}\) and \(\varvec{B}\) are nonnegative matrices. In the worst-case, the score of the Kth biggest element is taken from a single power-law and the contribution of the rest of the factors is 0, hence \(K^{-\alpha _m}\) is a lower-bound for the Kth biggest value.

  4. In the worst-case scenario, this element is at position x in every of the factors.

  5. We consider the sum of the nonzeros of both tensors.

  6. Note that the quality of absolute precision numbers is affected by (1) how imbalanced the two classes are and (2) the cost of false positives. An improvement from 2 to 5% precision might imply that 1 out of 20 phone-calls we make target a potential customer versus every 1 in 50.

  7. https://www.tycho.pitt.edu/.

  8. https://www.cdc.gov/mmwr/preview/mmwrhtml/00056803.htm.

References

  1. Acar E, Aykut-Bingol C, Bingol H, Bro R, Yener B (2007) Multiway analysis of epilepsy tensors. Bioinformatics 23(13):i10–i18

    Article  Google Scholar 

  2. Araujo M, Günnemann S, Mateos G, Faloutsos C (2014) Beyond blocks: hyperbolic community detection. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 50–65

  3. Bader BW, Kolda TG et al (2015) Matlab tensor toolbox version 2.6. http://www.sandia.gov/tgkolda/TensorToolbox/. Accessed 3 Jan 2018

  4. Baiocchi G, Distaso W (2003) GRETL: econometric software for the gnu generation. J Appl Econom 18(1):105–110

    Article  Google Scholar 

  5. Beutel A, Talukdar PP, Kumar A, Faloutsos C, Papalexakis EE, Xing EP (2014) Flexifact: scalable flexible factorization of coupled tensors on hadoop. IN: SDM, SIAM, pp 109–117

  6. Box GE, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J Am Stat Assoc 65(332):1509–1526

    Article  MathSciNet  MATH  Google Scholar 

  7. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: ACM sigmod record, vol 29, ACM, pp 93–104

  8. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  9. Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on world wide web, ACM, pp 925–936

  10. Dunlavy DM, Kolda TG, Acar E (2011) Temporal link prediction using matrix and tensor factorizations. ACM Trans Knowl Discov Data (TKDD) 5(2):10

    Google Scholar 

  11. Ermiş B, Cemgil AT, Acar E (2013) Generalized coupled symmetric tensor factorization for link prediction. In: 2013 21st signal processing and communications applications conference (SIU), IEEE, pp 1–4

  12. Gao S, Denoyer L, Gallinari P (2011) Temporal link prediction by integrating content and structure information. In: Proceedings of the 20th ACM international conference on information and knowledge management, ACM, pp 1169–1174

  13. Grimmett G, Stirzaker D (2001) Probability and random processes. Oxford University Press, Oxford

    MATH  Google Scholar 

  14. Guha S, Mishra N, Roy G, Schrijvers O (2016) Robust random cut forest based anomaly detection on streams. In: International conference on machine learning, pp 2712–2721

  15. Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an“explanatory” multi-modal factor analysis. University of California, Los Angeles

  16. He Z, Xie S, Zdunek R, Zhou G, Cichocki A (2011) Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. IEEE Trans Neural Netw 22(12):2117–2131

    Article  Google Scholar 

  17. Iasemidis LD, Sackellares JC (1996) Review: chaos theory and epilepsy. Neuroscientist 2(2):118–126

    Article  Google Scholar 

  18. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500

    Article  MathSciNet  MATH  Google Scholar 

  19. Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 426–434

  20. Koren Y (2010) Collaborative filtering with temporal dynamics. Commun ACM 53(4):89–97

    Article  Google Scholar 

  21. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp 556–562

  22. Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In: European conference on principles of data mining and knowledge discovery, Springer, pp 133–145

  23. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  24. Matsubara Y, Sakurai Y, Faloutsos C, Iwata T, Yoshikawa M (2012) Fast mining and forecasting of complex time-stamped events. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 271–279

  25. Menon AK, Elkan C (2011) Link prediction via matrix factorization. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 437–452

  26. Neyshabur B, Srebro N (2014) On symmetric and asymmetric LSHS for inner product search. arXiv preprint arXiv:1410.5518

  27. Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering, 2003, IEEE, pp 315–326

  28. Papalexakis EE, Faloutsos C, Sidiropoulos ND (2012) Parcube: sparse parallelizable tensor decompositions. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 521–536

  29. Pasta MQ, Jan Z, Sallaberry A, Zaidi F (2013) Tunable and growing network generation model with community structures. In: 2013 third international conference on cloud and green computing (CGC), IEEE, pp 233–240

  30. Ram P, Gray AG (2012) Maximum inner-product search using cone trees. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 931–939

  31. Scheffer M, Bascompte J, Brock WA, Brovkin V, Carpenter SR, Dakos V, Held H, Van Nes EH, Rietkerk M, Sugihara G (2009) Early-warning signals for critical transitions. Nature 461(7260):53

    Article  Google Scholar 

  32. Sharan U, Neville J (2008) Temporal-relational classifiers for prediction in evolving domains. In: 2008 eighth IEEE international conference on data mining, IEEE, pp 540–549

  33. Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37

    Article  Google Scholar 

  34. Shrivastava A, Li P (2014) Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS). arXiv preprint arXiv:1410.5410

  35. Şimşekli U, Ermiş B, Cemgil, AT, Acar E (2013) Optimal weight learning for coupled tensor factorization with mixed divergences. In: 21st European signal processing conference (EUSIPCO 2013), IEEE, pp 1–5

  36. Song H A, Hooi B, Jereminov M, Pandey A, Pileggi L, Faloutsos C (2017) Powercast: mining and forecasting power grid sequences. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases. ECML PKDD 2017. Lecture Notes in Computer Science, vol 10535. Springer, Cham, pp 606–621

  37. Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 374–383

  38. Tao D, Maybank S, Hu W, Li X (2005) Stable third-order tensor representation for color image classification. In: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence, IEEE Computer Society, pp 641–644

  39. Teflioudi C, Gemulla R, Mykytiuk O (2015) Lemp: fast retrieval of large entries in a matrix product. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, pp 107–122

  40. Walker PB, Gilpin S, Fooshee S, Davidson I (2015) Constrained tensor decomposition via guidance: increased inter and intra-group reliability in FMRI analyses. In: International conference on augmented cognition, Springer, pp 361–369

  41. Welling M, Weber M (2001) Positive tensor factorization. Pattern Recognit Lett 22(12):1255–1261

    Article  MATH  Google Scholar 

  42. Xie Z, Li X, Wang X (2007) A new community-based evolving network model. Phys A Stat Mech Appl 384(2):725–732

    Article  Google Scholar 

  43. Yılmaz YK (2012) Generalized tensor factorization. Ph.D. thesis, Citeseer

  44. Zellner A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assoc 57(298):348–368

    Article  MathSciNet  MATH  Google Scholar 

  45. Zhou X, Xiang L, Xiao-Fan W (2008) Weighted evolving networks with self-organized communities. Commun Theor Phys 50(1):261

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1247489, and by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053. This work was also financed by the ERDF European Regional Development Fund through the Operational Programme for Competitiveness and Internationalization - COMPETE 2020 Programme within Project POCI-01-0145-FEDER-006961, and by FCT Fundao para a Cincia e a Tecnologia (Portuguese Foundation for Science and Technology) as part of Project UID/EEA/50014/2013. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, the Army Research Laboratory, the U.S. Government, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel Araujo.

Appendix: multiplicative updates of coupled tensors factorization

Appendix: multiplicative updates of coupled tensors factorization

The nonnegative coupled tensor factorization problem

$$\begin{aligned} \min _{\varvec{A,B,C,T}}\left||\varvec{\mathcal {X}} - \displaystyle \sum _f \varvec{a_f} \circ \varvec{b_f} \circ \varvec{t_f} \right||_F^2 + \lambda \left||\varvec{\mathcal {Y}} - \displaystyle \sum _f \varvec{a_f} \circ \varvec{c_f} \circ \varvec{t_f}\right||_F^2 \end{aligned}$$

is well studied and its multiplicative update equations have been previously described in the literature (e.g., considering the dispersion parameter \(\lambda \) [35]). The solution can be found by iteratively updating

$$\begin{aligned} \varvec{A}&\leftarrow \varvec{A} \otimes \frac{\varvec{\mathcal {X}_{(1)}}(\varvec{B}\odot \varvec{T}) + \lambda \varvec{\mathcal {Y}_{(1)}}(\varvec{C}\odot \varvec{T})}{\varvec{A}(\varvec{B}\odot \varvec{T})^t(\varvec{B}\odot \varvec{T})+\lambda \varvec{A}(\varvec{C}\odot \varvec{T})^t(\varvec{C}\odot \varvec{T})} \\ \varvec{B}&\leftarrow \varvec{B} \otimes \frac{\varvec{\mathcal {X}_{(2)}}(\varvec{A}\odot \varvec{T})}{\varvec{B}(\varvec{A}\odot \varvec{T})^t(\varvec{A}\odot \varvec{T})} \\ \varvec{C}&\leftarrow \varvec{C} \otimes \frac{\varvec{\mathcal {Y}_{(2)}}(\varvec{A}\odot \varvec{T})}{\varvec{C}(\varvec{A}\odot \varvec{T})^t(\varvec{A}\odot \varvec{T})}\\ \varvec{T}&\leftarrow \varvec{T} \otimes \frac{\varvec{\mathcal {X}_{(3)}}(\varvec{A}\odot \varvec{B}) + \lambda \varvec{\mathcal {Y}_{(3)}}(\varvec{A}\odot \varvec{C})}{\varvec{T}(\varvec{A}\odot \varvec{B})^t(\varvec{A}\odot \varvec{B})+\lambda \varvec{T}(\varvec{A}\odot \varvec{C})^t(\varvec{A}\odot \varvec{C})} \end{aligned}$$

The problem is not as well understood when one of the factorizations is symmetric, e.g., \(\displaystyle \varvec{\hat{\mathcal {Y}}} = \sum _f \varvec{a_f} \circ \varvec{a_f} \circ \varvec{t_f}\), as this is no longer a linear problem.

Welling and Weber [41] note the need for a scaling exponent (for the simple, non-coupled case):

$$\begin{aligned} \varvec{A}&\leftarrow \varvec{A} \otimes \left( \frac{\varvec{\mathcal {X}_{(1)}}(\varvec{A}\odot \varvec{T})}{\varvec{A}(\varvec{A}\odot \varvec{T})^t(\varvec{A}\odot \varvec{T})}\right) ^{1/d} \end{aligned}$$

which should be at least 1 / 2 for the matrix case, although no proof is provided. To the best of our knowledge, the best theoretical bound is 1 / 3 when the matrix is semi-definite positive [16]. Empirical results (for the coupled case) indicate that removing the exponent (\(d=1\)) might eliminate the convergence guarantees, but even small perturbations converge (e.g., 0.98 in [11]).

We recommend an exponent of 1 / 3, as convergence is exponentially fast in any case.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Araujo, M., Ribeiro, P., Song, H.A. et al. TensorCast: forecasting and mining with coupled tensors. Knowl Inf Syst 59, 497–522 (2019). https://doi.org/10.1007/s10115-018-1223-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1223-9

Keywords

Navigation