Uncovering and predicting the dynamic process of information cascades with survival model

Yu, Linyun; Cui, Peng; Wang, Fei; Song, Chaoming; Yang, Shiqiang

doi:10.1007/s10115-016-0955-7

Uncovering and predicting the dynamic process of information cascades with survival model

Regular Paper
Published: 12 May 2016

Volume 50, pages 633–659, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Linyun Yu¹,
Peng Cui¹,
Fei Wang²,
Chaoming Song³ &
…
Shiqiang Yang¹

600 Accesses
15 Citations
Explore all metrics

Abstract

Cascades are ubiquitous in various network environments. Predicting these cascades is decidedly nontrivial in various important applications, such as viral marketing, epidemic prevention, and traffic management. Most previous works have focused on predicting the final cascade sizes. As cascades are dynamic processes, it is always interesting and important to predict the cascade size at any given time, or to predict the time when a cascade will reach a certain size (e.g., the threshold for an outbreak). In this paper, we unify all these tasks into a fundamental problem: cascading process prediction. That is, given the early stage of a cascade, can we predict its cumulative cascade size at any later time? For such a challenging problem, an understanding of the micromechanism that drives and generates the macrophenomena (i.e., the cascading process) is essential. Here, we introduce behavioral dynamics as the micromechanism to describe the dynamic process of an infected node’s neighbors getting infected by a cascade (i.e., one-hop sub-cascades). Through data-driven analysis, we find out the common principles and patterns lying in the behavioral dynamics and propose the novel NEtworked WEibull Regression model for modeling it. We also propose a novel method for predicting cascading processes by effectively aggregating behavioral dynamics and present a scalable solution to approximate the cascading process with a theoretical guarantee. We evaluate the proposed method extensively on a large-scale social network dataset. The results demonstrate that the proposed method can significantly outperform other state-of-the-art methods in multiple tasks including cascade size prediction, outbreak time prediction, and cascading process prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structure Pattern Analysis and Cascade Prediction in Social Networks

The unified model of social influence and its application in influence maximization

Article 14 November 2015

Learning the Hotness of Information Diffusions with Multi-dimensional Hawkes Processes

Notes

Here, the cascades are information cascades. When a user re-tweets/generates a post, several of his/her followers will further re-tweet the post and so on to form an information cascade.
The dataset is complete and publicly available at http://www.thumedia.org.
We think that a follower with a different re-tweet number will have different effects to the user, and so we modify the weights of each term $follower\_avg\_inflow\_rate$ and $follower\_avg\_retweet\_rate$.
It will be counted multiple times if a specific user is involved in multiple cascades.

References

Chen W, Collins A, Cummings R, Ke T, Liu Z, Rincon D, Sun X, Wang Y, Wei W, Yuan Y (2011) Influence maximization in social networks when negative opinions may emerge and propagate. In: SDM, vol. 11, pp 379–390. SIAM
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 199–208
Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee, pp 925–936
Cohen R, Havlin S, Ben-Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901
Article Google Scholar
Cox D (1972) Regression models and life-tables. J R Stat Soc Ser B 34(2):187–220
MathSciNet MATH Google Scholar
Cui P, Jin S, Yu L, Wang F, Zhu W, Yang S (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 901–909
Cui P, Wang F, Liu S, Ou M, Yang S, Sun L (2011) Who should share what?: item-level social influence prediction for users and posts ranking. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 185–194
Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 57–66
Du N, Song L, Woo H, Zha H (2013) Uncover topic-sensitive information diffusion networks. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics, pp 229–237
Gionis A, Terzi E, Tsaparas P (2013) Opinion maximization in social networks. arXiv preprint arXiv:1301.7455
Gomez-Rodriguez M, Leskovec J, Schölkopf B (2013) Modeling information propagation with survival theory. In: ICML
Gomez Rodriguez M, Leskovec J, Schölkopf B (2013) Structure and dynamics of information pathways in online media. In: Proceedings of the sixth ACM international conference on web search and data mining. ACM, pp 23–32
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 137–146
Miller RG Jr (2011) Survival analysis, vol 66. Wiley, London
Google Scholar
Myers S, Leskovec J (2010) On the convexity of latent social network inference. In: Advances in neural information processing systems, pp 1741–1749
Pinder III JE, Wiener JG, Smith MH (1978) The weibull distribution: a new method of summarizing survivorship data. Ecology 59(1):175–179. doi:10.2307/1936645
Rodriguez G (2005) Parametric survival models. Lectures Notes, Princeton University
Rodriguez MG, Balduzzi D, Schölkopf B (2011) Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697
Rodriguez MG, Schölkopf B (2012) Influence maximization in continuous time diffusion networks. arXiv preprint arXiv:1205.1682
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288. http://www.jstor.org/stable/2346178
Yu L, Cui P, Wang F, Song C, Yang S (2015) From micro to macro: uncovering and predicting information cascading process with behavioral dynamics. In: Data mining (ICDM), 2015 IEEE international conference on, pp 559–568. IEEE

Download references

Acknowledgments

This work was supported by National Program on Key Basic Research Project, No. 2015CB352300; National Natural Science Foundation of China, Nos. 61370022, 61531006, 61472444 and 61210008; and International Science and Technology Cooperation Program of China, No. 2013DFG12870. Thanks for the research fund of Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology, and Beijing Key Laboratory of Networked Multimedia.

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, China
Linyun Yu, Peng Cui & Shiqiang Yang
School of Software Engineering, Beijing University of Technology, Beijing, China
Fei Wang
Department of Physics, University of Miami, Coral Gables, FL, USA
Chaoming Song

Authors

Linyun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoming Song
View author publications
You can also search for this author in PubMed Google Scholar
Shiqiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Linyun Yu.

Appendix: Proof of Theorem 1

Proof

It is evident that both $G_2(\beta , \lambda )$ and $G_3(\gamma , k)$ have global minimum value. Next we prove that $G_1(\lambda , k)$ also has global minimum value, or to prove $\log {L(\lambda , k)}$ has global maximum value.

Let $\lambda _i'=\lambda _i^{-k_i}$, $\log {L'(\lambda ', k)}=\log {L(\lambda , k)}=\sum _{i=1}^N l_i'(\lambda _i',k_i)$ where $l_i'(\lambda _i',k_i)=m_i\log {k_i}+(k_i-1)\sum _{j=1}^{m_i}{\log {T_{i,j}}}+m_i\log {\lambda _i'}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}}$, the partial derivatives of the $l_i'$ are given by:

$$\begin{aligned}&\frac{\partial {l_i'}}{\partial {\lambda _i'}}=\frac{m_i}{\lambda _i'}-\sum _{j=1}^{m_i}{T_{i,j}^{k_i}},~~\frac{\partial ^2{l_i'}}{\partial {\lambda _i'^2}}=-\frac{m_i}{\lambda _i'^2}<0 \end{aligned}$$

(15)

$$\begin{aligned}&\frac{\partial {l_i'}}{\partial {k_i}}=\frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\end{aligned}$$

(16)

$$\begin{aligned}&\frac{\partial ^2{l_i'}}{\partial {k_i^2}}=-\frac{m_i}{k_i^2}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\left( \log {T_{i,j}}\right) ^2}<0 \end{aligned}$$

(17)

Since $\frac{\partial ^2{l_i'}}{\partial {\lambda _i'^2}}<0$ and $\frac{\partial ^2{l_i'}}{\partial {k_i^2}}<0$, the conditional marginal posterior densities of parameters $\lambda _i'$ and $k_i$ are log-concave. Moreover, when $0<k_i<1, 0<\lambda _i'<\min \left( \frac{m_i}{\sum _{i=1}^{m_i}T_{i,j}^{k_i}},\frac{1}{\sum _{j=1}^{m_i}{T_{i,j}\log {T_i}}}\right) $,

$$\begin{aligned} \frac{\partial {l_i'}}{\partial {\lambda _i'}}&=\frac{m_i}{\lambda _i'}-\sum _{j=1}^{m_i}{T_{i,j}^{k_i}}\ge \frac{m_i}{\frac{m_i}{\sum _{i=1}^{m_i}T_{i,j}^{k_i}}}-\sum _{j=1}{m_i}{T_{i,j}^{k_i}}=0\nonumber \\ \frac{\partial {l_i'}}{\partial {k_i}}&=\frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}} \nonumber \\&\ge \frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\frac{\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}}{\sum _{j=1}^{m_i}{T_{i,j}\log {T_i}}}\ge m_i + \sum _{j=1}^{m_i}{\log {T_{i,j}}} - 1 > 0 \end{aligned}$$

(18)

when $k_i \ge \max \left( 1, \frac{m_i}{\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}\log {T_{i,j}}} - \sum _{j=1}^{m_i}{\log {T_{i,j}}}} \right) $ and $\lambda _i'\ge \max \left( 1, \frac{m_i}{\sum _{j=1}^{m_i}T_{i,j}^{k_i}}\right) $,

$$\begin{aligned} \frac{\partial {l_i'}}{\partial {\lambda _i'}}&=\frac{m_i}{\lambda _i'}-\sum _{j=1}^{m_i}{T_{i,j}^{k_i}}\le \frac{m_i}{\frac{m_i}{\sum _{i=1}^{m_i}T_{i,j}^{k_i}}}-\sum _{j=1}{m_i}{T_{i,j}^{k_i}}=0 \end{aligned}$$

(19)

$$\begin{aligned} \frac{\partial {l_i'}}{\partial {k_i}}&=\frac{m_i}{k_i}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\nonumber \\&\le \frac{m_i}{\frac{m_i}{\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}\log {T_{i,j}}} - \sum _{j=1}^{m_i}{\log {T_{i,j}}}}} +\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\nonumber \\&= \lambda _i'\sum _{j=1}^{m_i}{T_{i,j}\log {T_{i,j}}} - \sum _{j=1}^{m_i}{\log {T_{i,j}}}+\sum _{j=1}^{m_i}{\log {T_{i,j}}}-\lambda _i'\sum _{j=1}^{m_i}{T_{i,j}^{k_i}\log {T_{i,j}}}\nonumber \\&<0 \end{aligned}$$

(20)

which means there should be a global maximum of $l_i'$, so does $\log L$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, L., Cui, P., Wang, F. et al. Uncovering and predicting the dynamic process of information cascades with survival model. Knowl Inf Syst 50, 633–659 (2017). https://doi.org/10.1007/s10115-016-0955-7

Download citation

Received: 17 March 2016
Revised: 18 March 2016
Accepted: 02 April 2016
Published: 12 May 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10115-016-0955-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Uncovering and predicting the dynamic process of information cascades with survival model

Abstract

Access this article

Similar content being viewed by others

Structure Pattern Analysis and Cascade Prediction in Social Networks

The unified model of social influence and its application in influence maximization

Learning the Hotness of Information Diffusions with Multi-dimensional Hawkes Processes

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Uncovering and predicting the dynamic process of information cascades with survival model

Abstract

Access this article

Similar content being viewed by others

Structure Pattern Analysis and Cascade Prediction in Social Networks

The unified model of social influence and its application in influence maximization

Learning the Hotness of Information Diffusions with Multi-dimensional Hawkes Processes

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Theorem 1

Appendix: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation