Skip to main content
Log in

Uncovering and predicting the dynamic process of information cascades with survival model

Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Cascades are ubiquitous in various network environments. Predicting these cascades is decidedly nontrivial in various important applications, such as viral marketing, epidemic prevention, and traffic management. Most previous works have focused on predicting the final cascade sizes. As cascades are dynamic processes, it is always interesting and important to predict the cascade size at any given time, or to predict the time when a cascade will reach a certain size (e.g., the threshold for an outbreak). In this paper, we unify all these tasks into a fundamental problem: cascading process prediction. That is, given the early stage of a cascade, can we predict its cumulative cascade size at any later time? For such a challenging problem, an understanding of the micromechanism that drives and generates the macrophenomena (i.e., the cascading process) is essential. Here, we introduce behavioral dynamics as the micromechanism to describe the dynamic process of an infected node’s neighbors getting infected by a cascade (i.e., one-hop sub-cascades). Through data-driven analysis, we find out the common principles and patterns lying in the behavioral dynamics and propose the novel NEtworked WEibull Regression model for modeling it. We also propose a novel method for predicting cascading processes by effectively aggregating behavioral dynamics and present a scalable solution to approximate the cascading process with a theoretical guarantee. We evaluate the proposed method extensively on a large-scale social network dataset. The results demonstrate that the proposed method can significantly outperform other state-of-the-art methods in multiple tasks including cascade size prediction, outbreak time prediction, and cascading process prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. Here, the cascades are information cascades. When a user re-tweets/generates a post, several of his/her followers will further re-tweet the post and so on to form an information cascade.

  2. The dataset is complete and publicly available at http://www.thumedia.org.

  3. We think that a follower with a different re-tweet number will have different effects to the user, and so we modify the weights of each term \(follower\_avg\_inflow\_rate\) and \(follower\_avg\_retweet\_rate\).

  4. It will be counted multiple times if a specific user is involved in multiple cascades.

References

  1. Chen W, Collins A, Cummings R, Ke T, Liu Z, Rincon D, Sun X, Wang Y, Wei W, Yuan Y (2011) Influence maximization in social networks when negative opinions may emerge and propagate. In: SDM, vol. 11, pp 379–390. SIAM

  2. Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 199–208

  3. Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee, pp 925–936

  4. Cohen R, Havlin S, Ben-Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901

    Article  Google Scholar 

  5. Cox D (1972) Regression models and life-tables. J R Stat Soc Ser B 34(2):187–220

    MathSciNet  MATH  Google Scholar 

  6. Cui P, Jin S, Yu L, Wang F, Zhu W, Yang S (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 901–909

  7. Cui P, Wang F, Liu S, Ou M, Yang S, Sun L (2011) Who should share what?: item-level social influence prediction for users and posts ranking. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 185–194

  8. Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 57–66

  9. Du N, Song L, Woo H, Zha H (2013) Uncover topic-sensitive information diffusion networks. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics, pp 229–237

  10. Gionis A, Terzi E, Tsaparas P (2013) Opinion maximization in social networks. arXiv preprint arXiv:1301.7455

  11. Gomez-Rodriguez M, Leskovec J, Schölkopf B (2013) Modeling information propagation with survival theory. In: ICML

  12. Gomez Rodriguez M, Leskovec J, Schölkopf B (2013) Structure and dynamics of information pathways in online media. In: Proceedings of the sixth ACM international conference on web search and data mining. ACM, pp 23–32

  13. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 137–146

  14. Miller RG Jr (2011) Survival analysis, vol 66. Wiley, London

    Google Scholar 

  15. Myers S, Leskovec J (2010) On the convexity of latent social network inference. In: Advances in neural information processing systems, pp 1741–1749

  16. Pinder III JE, Wiener JG, Smith MH (1978) The weibull distribution: a new method of summarizing survivorship data. Ecology 59(1):175–179. doi:10.2307/1936645

  17. Rodriguez G (2005) Parametric survival models. Lectures Notes, Princeton University

  18. Rodriguez MG, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697

  19. Rodriguez MG, Schölkopf B (2012) Influence maximization in continuous time diffusion networks. arXiv preprint arXiv:1205.1682

  20. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288. http://www.jstor.org/stable/2346178

  21. Yu L, Cui P, Wang F, Song C, Yang S (2015) From micro to macro: uncovering and predicting information cascading process with behavioral dynamics. In: Data mining (ICDM), 2015 IEEE international conference on, pp 559–568. IEEE

Download references

Acknowledgments

This work was supported by National Program on Key Basic Research Project, No. 2015CB352300; National Natural Science Foundation of China, Nos. 61370022, 61531006, 61472444 and 61210008; and International Science and Technology Cooperation Program of China, No. 2013DFG12870. Thanks for the research fund of Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology, and Beijing Key Laboratory of Networked Multimedia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linyun Yu.

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Proof

It is evident that both \(G_2(\beta , \lambda )\) and \(G_3(\gamma , k)\) have global minimum value. Next we prove that \(G_1(\lambda , k)\) also has global minimum value, or to prove \(\log {L(\lambda , k)}\) has global maximum value.

Let \(\lambda _i'=\lambda _i^{-k_i}\), \(\log {L'(\lambda ', k)}=\log {L(\lambda , k)}=\sum _{i=1}^N l_i'(\lambda _i',k_i)\) where \(l_i'(\lambda _i',k_i)=m_i\log {k_i}+(k_i-1)\sum _{j=1}^{m_i}{\log {T_{i,j}}}+m_i\log {\lambda _i'}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}}\), the partial derivatives of the \(l_i'\) are given by:

$$\begin{aligned}&\frac{\partial {l_i'}}{\partial {\lambda _i'}}=\frac{m_i}{\lambda _i'}-\sum _{j=1}^{m_i}{T_{i,j}^{k_i}},~~\frac{\partial ^2{l_i'}}{\partial {\lambda _i'^2}}=-\frac{m_i}{\lambda _i'^2}<0 \end{aligned}$$
(15)
$$\begin{aligned}&\frac{\partial {l_i'}}{\partial {k_i}}=\frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\end{aligned}$$
(16)
$$\begin{aligned}&\frac{\partial ^2{l_i'}}{\partial {k_i^2}}=-\frac{m_i}{k_i^2}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\left( \log {T_{i,j}}\right) ^2}<0 \end{aligned}$$
(17)

Since \(\frac{\partial ^2{l_i'}}{\partial {\lambda _i'^2}}<0\) and \(\frac{\partial ^2{l_i'}}{\partial {k_i^2}}<0\), the conditional marginal posterior densities of parameters \(\lambda _i'\) and \(k_i\) are log-concave. Moreover, when \(0<k_i<1, 0<\lambda _i'<\min \left( \frac{m_i}{\sum _{i=1}^{m_i}T_{i,j}^{k_i}},\frac{1}{\sum _{j=1}^{m_i}{T_{i,j}\log {T_i}}}\right) \),

$$\begin{aligned} \frac{\partial {l_i'}}{\partial {\lambda _i'}}&=\frac{m_i}{\lambda _i'}-\sum _{j=1}^{m_i}{T_{i,j}^{k_i}}\ge \frac{m_i}{\frac{m_i}{\sum _{i=1}^{m_i}T_{i,j}^{k_i}}}-\sum _{j=1}{m_i}{T_{i,j}^{k_i}}=0\nonumber \\ \frac{\partial {l_i'}}{\partial {k_i}}&=\frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}} \nonumber \\&\ge \frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\frac{\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}}{\sum _{j=1}^{m_i}{T_{i,j}\log {T_i}}}\ge m_i + \sum _{j=1}^{m_i}{\log {T_{i,j}}} - 1 > 0 \end{aligned}$$
(18)

when \(k_i \ge \max \left( 1, \frac{m_i}{\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}\log {T_{i,j}}} - \sum _{j=1}^{m_i}{\log {T_{i,j}}}} \right) \) and \(\lambda _i'\ge \max \left( 1, \frac{m_i}{\sum _{j=1}^{m_i}T_{i,j}^{k_i}}\right) \),

$$\begin{aligned} \frac{\partial {l_i'}}{\partial {\lambda _i'}}&=\frac{m_i}{\lambda _i'}-\sum _{j=1}^{m_i}{T_{i,j}^{k_i}}\le \frac{m_i}{\frac{m_i}{\sum _{i=1}^{m_i}T_{i,j}^{k_i}}}-\sum _{j=1}{m_i}{T_{i,j}^{k_i}}=0 \end{aligned}$$
(19)
$$\begin{aligned} \frac{\partial {l_i'}}{\partial {k_i}}&=\frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\nonumber \\&\le \frac{m_i}{\frac{m_i}{\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}\log {T_{i,j}}} - \sum _{j=1}^{m_i}{\log {T_{i,j}}}}} +\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\nonumber \\&= \lambda _i'\sum _{j=1}^{m_i}{T_{i,j}\log {T_{i,j}}} - \sum _{j=1}^{m_i}{\log {T_{i,j}}}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\nonumber \\&<0 \end{aligned}$$
(20)

which means there should be a global maximum of \(l_i'\), so does \(\log L\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, L., Cui, P., Wang, F. et al. Uncovering and predicting the dynamic process of information cascades with survival model. Knowl Inf Syst 50, 633–659 (2017). https://doi.org/10.1007/s10115-016-0955-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0955-7

Keywords

Navigation