Abstract
Studying the bursty nature of cascades in social media is practically important in many real applications such as product sales prediction, disaster relief, and stock market prediction. Although both the cascade size prediction and the burst patterns of the cascades have been extensively studied, how to predict when a burst will come remains an open problem. It is challenging for traditional time-series-based models such as regression models to address this task directly. Firstly, times-series-based prediction models focus on predicting the future values based on previously observed ones. It is hard to apply them to predict the time of a bursts with the “quick rise-and-fall” pattern. Secondly, besides the cascade popularity, a lot of other side information like user profile and social relation are available in social media. Although the potential utility of such information can be high, it is also hard for time-series-based models to capture and integrate these rich information with diverse formats seamlessly. This paper proposes a classification-based approach for burst time prediction by exploiting rich knowledge in information diffusion. Particularly, we first propose a time-window-based transformation to predict in which time window the burst will appear. By dividing the time spans of all the cascades into the same number of time windows K, the cascades with diverse time spans can thus be handled uniformly. To exploit the rich and heterogenous information in social media, we next propose a scale-independent feature extraction framework to model the heterogenous knowledge in a scale-independent manner. Systematical evaluations are conducted on the Sina Weibo reposting dataset and MemeTracker dataset. Besides the superior performance of the proposed approach, we also observe that: (1) surprisingly, social/structure knowledge is more indicative of the bursts than the cascade popularity information, especially for the bursts occurring in a farther future. (2) Larger cascades are harder to predict as the spreading process of the cascades with higher popularity is usually more diverse and fluctuant. (3) The proposed approach is robust in the sense that the result is not much sensitive to the popularity of the training cascades.










Similar content being viewed by others
References
Hu X, Tang L, Tang JL, Liu H (2013) Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the sixth ACM international conference on web search and data mining, pp 537–546
Oh J, Susarla A, Tan Y (2008) Examining the diffusion of user-generated content in online social networks. Soc Sci Res Netw. doi:10.2139/ssrn.1182631. http://ssrn.com/abstract=1182631
Wang SZ, Hu X, Yu PS, Li ZJ (2014) MMRate: inferring multi-aspect diffusion networks with multi-pattern cascades. In: Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining, pp 1246–1255
Wang SZ, Zhang HH, Zhang JW, Zhang XM, Yu PS, Li ZJ (2015) Inferring diffusion networks with sparse cascades by structure transfer. In: Proceedings of the 20th international conference on database systems for advanced applications, pp 405–421
Parikh N, Sundaresan N (2008) Scalable and near real-time burst detection from e-commerce queries. In: Proceedings of the 14th ACM SIGKDD conference on knowledge discovery and data mining, pp 972–980
Cui P, Jin SF, Yu LY, Wang F, Zhu WW, Yang SQ (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM conference on knowledge discovery and data mining, pp 901–909
Mill TC (1990) Time series techniques for economists. Cambridge University Press, Cambridge
Goel S, Anderson A, Hofman J, Watts D (2013) The structure virality of online diffusion (preprint)
Gruhl D, Guha R, Kumar R, Novak J, Tomkins A (2005) The predictive power of online chatter. In: Proceedings of the 11th ACM SIGKDD conference on knowledge discovery and data mining, pp 78–87
Kong SB, Mei QZ, Feng L, Zhao Z, Ye F (2014) On the Real-time prediction problem of bursting hashtags in twitter. CoRR abs/1401.2018
Papadimitriou P, Dasdan A, Garcia-Molina H (2008) Web Graph Similarity for Anomaly Detection. iN: Proceedings of the 17th International World Wide Web Conference, pp 1167–1168
Ma ZY, Sun AX, Cong G (2013) On predicting the popularity of newly emerging hashtags in twitter. J Am Soc Inf Sci Technol 7(64):1399–1410
Lin JH (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 1(37):145–151
Zhang J, Liu B, Tang J, Chen T, Li JZ (2013) Social influence locality for modeling retweeting behaviors. In: Proceedings of the 23rd international joint conference on artificial intelligence, pp 2761–2767
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international World Wide Web conference, pp 851–860
Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD conference on knowledge discovery and data mining, pp 91–101
Li L, Liang CJM, Liu J, Nath S, Terzis A, Faloutsos C (2011) Thermocast: a cyber-physical forecasting model for data centers. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining, pp 1370–1378
Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci USA 41(105):15649–15663
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 177–186
Kleinberg J (2005) Temporal dynamics of on-line information streams. In: Data Stream Managemnt: Processing High-speed Data. Springer
Zhu YY, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the 9th ACM SIGKDD conference on knowledge discovery and data mining, pp 336–345
Pinsen D (2012) Predicting the bursting of a market bubble. http://finance.yahoo.com/news/predicting-bursting-market-bubble-171432469.html
Barabási A (2011) BURSTS: the hidden pattern behind everything we Do, from Your E-mail to Bloody Crusades. Penguin, New York
Barabási A (2005) The origin of bursts and heavy tails in human dynamics. Nature 435:207–211
Vazquez A, Oliveira JG, Dezso Z, Goh K, Kondor I, Barabási A (2006) Modeling bursts and heavy tails in human dynamics. Phys Rev E 73, 036126:1-19
Matsubara Y, Sakurai Y, Prakash BA, Li L, Faloutsos C (2012) Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining, pp 6–14
Hong LJ, Dan O, Davison BD (2011) Predicting popular messages in twitter. In: Proceedings of the 20th international World Wide Web conference, pp 57–58
Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):81–88
Kupavskii A, Umnov A, Gusev G, Serdyukov P (2013) Predicting the audience size of a Tweet. In: Proceedings of the seventh international AAAI conference on weblogs and social media, pp 693–696
Petrovic S, Osborne M, Lavrenko V (2011) RT to Win! Predicting message propagation in twitter. In: Proceedings of the fifth international AAAI conference on weblogs and social media
Myers S, Leskovec J (2014) The bursty dynamics of the twitter information network. In: Proceedings of the 23th international World Wide Web conference, pp 913–924
Goel S, Watts DJ, Goldstein DG (2012) The structure of online diffusion networks. In: Proceedings of conceptual modeling—31st international conference, pp 623–638
Cheng J, Adamic LA, Dow PA, Kleinberg J, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international World Wide Web conference, pp 925–936
Kupavskii A, Ostroumova L, Umnov A, Usachev S, Serdyukov P, Gusev G, Kustarev A (2012) Prediction of retweet cascade size over time. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 2335–2338
Gershenfeld N (1999) The nature of mathematical modeling. Cambridge University Press, Cambridge, pp 205–208
Said SE, Dickey DA (1984) Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71(3):599–607
Motulsky H, Christopoulos A (2004) Fitting models to biological data using linear and nonlinear regression: a practical guide to curve fitting. England Oxford University Press, Oxford
Chakrabarti D, Faloutsos C (2002) Large-scale automated forecasting using fractals. In: Proceedings of the eleventh international conference on information and knowledge management
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the Web. Technical Report Stanford InfoLab
Kleinberg JM (1999) Hubs, authorities, and communities. ACM Comput Surv 31(4):5
Gomez-Rodriguez M, Leskovec J, Scholkopf B (2013) Modeling information propagation with survival theory. The 30th international conference on machine learning
Wang SZ, Xie SH, Zhang XM, Li ZJ, Yu PS, and Shu XY (2014) Future influence ranking of scientific literature. In: 2014 SIAM international conference on data mining
Cui P, Wang F, Liu SW, Ou MD, Yang SQ (2011) Who should share what? Item-level social influence prediction for users and posts ranking. In: The 34th international ACM SIGIR conference on research and development in information retrieval
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923 (1998)
Wang SZ, Yan Z, Hu X, Yu PS, Li ZJ (2015) Burst time prediction in cascades. In: The twenty-ninth AAAI conference on artificial intelligence
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China (Grant Nos. 61170189, 61370126, 61202239), National High Technology Research and Development Program of China under Grant (No. 2015AA016004), Major Projects of the National Social Science Fund of China under Grant (No. 14&ZH0036), Science and Technology Innovation Ability Promotion Project of Beijing (PXM2015-014203-000059), the Fund of the State Key Laboratory of Software Development Environment (No. SKLSDE-2015ZX-16), Microsoft Research Asia Fund (No. FY14-RES-OPP-105), the Innovation Foundation of BUAA for PhD Graduates (No. YWF-14-YJSY-021), US NSF through Grants III-1526499, CNS-1115234, and OISE-1129076.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, S., Yan, Z., Hu, X. et al. CPB: a classification-based approach for burst time prediction in cascades. Knowl Inf Syst 49, 243–271 (2016). https://doi.org/10.1007/s10115-015-0899-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0899-3