ABSTRACT
On online social networks such as Twitter, retweeting allows users to share a variety of content to their own followers. As tweets are retweeted from user to user, large cascades of tweets propagation are formed. During the past decade, social networks have been used for political rallying, civil society campaigns, and marketing promotions. The growth of cascades over time signal the popularity or lack thereof of the subject matter. In this work, we pose the question of whether the same feature set can be used for cascade growth prediction of any dataset on Twitter. We find evidence that features governing the cascade growth vary from one dataset to another. We first devise a definition of structural and temporal growth. Then, we propose an approach to select the best of these features based on the dataset for better accuracy results. We examine two types of growth prediction: structural and temporal. We use both Random Forest and Multilayer Perceptron as our models for growth prediction on data sets of two political campaigns in Egypt. The campaigns were concurrent in time, and were rallying for opposite causes, with a noticeable difference in societal popularity. We present and discuss the results of the most discriminating features in predicting cascades' growth of the two data sets, and provide evidence that the preselection of features improved the accuracy of the prediction task on the two data sets studied.
- N. T. Bailey. The Mathematical Theory of Infectious Diseases and its applications. Griffin, London, 1975.Google Scholar
- E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone's an influencer: quantifying influence on twitterr. Proceedings of the fourth ACM international conference on Web search and data mining, pages 65--74, 2011. Google ScholarDigital Library
- J. Berger and K. L. Milkman. What makes online content viral? Journal of Marketing Research, 49(2):192--205, 2012.Google ScholarCross Ref
- M. Cha, H. Haddadiy, F. Benevenutoz, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.Google Scholar
- J. Cheng, L. Adamic, A. Dow, J. Kleinberg, and J. Leskovec. Can cascades be predicted? International World Wide Web Conference (WWW'14), 2014. Google ScholarDigital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2Nd Edition). Wiley-Interscience, 2000. Google ScholarDigital Library
- T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett., 27(8):861--874, June 2006. Google ScholarDigital Library
- W. Galuba, K. Aberer, D. Chakraborty, Z. Despotovic, and W. Kellerer. Outtweeting the twitterers predicting information cascades in microblogs. Proceedings of the 3rd conference on Online social network, page 3, 2010. Google ScholarDigital Library
- D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarDigital Library
- M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pages 359--366, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599--653, 2000. Google ScholarDigital Library
- L. Hong, O. Dan, and B. D. Davison. Predicting popular messages in twitter. Proceedings of International Conference on World Wide Web, pages 57--58, 2011. Google ScholarDigital Library
- C. Hui, Y. Tyshchuk, W. A. Wallace, M. Magdon-Ismail, and M. Goldberg. Information cascades in social media in response to a crisis: a preliminary model and a case study. Proceedings of the 21st international conference companion on World Wide Web, pages 653--656, 2012. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web, pages 591--600, 2010. Google ScholarDigital Library
- J. Leskovec, M. Mcglohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. SDM, 2007.Google ScholarCross Ref
- D. Liben-Nowell and J. Kleinberg. Tracing information flow on a global scale using internet chainletter data. Proceedings of the National Academy of Sciences of the United states of America, 2008.Google Scholar
- H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowl. and Data Eng., 17(4):491--502, Apr. 2005. Google ScholarDigital Library
- Z. Ma, A. Sun, and G. Cong. On predicting the popularity of newly emerging hashtags in twitter. Journal of the American Society for Information Science and Technology, 64(7):1399--1410, 2013.Google ScholarCross Ref
- B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: Inferring social link creation times in twitter. In 20th International Conference on World Wide Web, pages 517--526, 2011. Google ScholarDigital Library
- M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we rt? Proceedings of the First Workshop on Social Media Analytics, pages 71--79, 2010. Google ScholarDigital Library
- M. Naseriparsa, A. Bidgoli, and T. Varaee. A hybrid feature selection method to improve performance of a group of classification algorithms. CoRR, abs/1403.2372, 2014.Google Scholar
- B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. Proceedings of the International AAAI Conference on Weblogs and Social Media, 2010.Google Scholar
- H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 27(8):1226--1238, Aug. 2005. Google ScholarDigital Library
- S. Petrovic, M. Osborne, and V. Lavrenko. Rt to win! predicting message propagation in twitter. AAAI Publications, Fifth International AAAI Conference on Weblogs and Social Media, 2011.Google Scholar
- M. Robnik-Šikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn., 53(1-2):23--69, Oct. 2003. Google ScholarDigital Library
- D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In 20th International Conference on World Wide Web, pages 695--704, 2011. Google ScholarDigital Library
- E. Sadikov and M. M. M. Martinez. Information propagation on twitter. CS322 Project Report, 2009.Google Scholar
- E. Sadikov, M. Medina, J. Leskovec, and H. Garcia-Molina. Correcting for missing data in information cascades. The 4th international conference on web research and data mining, pages 55--64, February 2011. Google ScholarDigital Library
- S. Scellato, C. Mascolo, M. Musolesi, and J. Crowcroft. Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades. Proceedings of the 20th international conference on World wide web, pages 457--466, 2011. Google ScholarDigital Library
- M. Z. Shafiq and A. X. Liu. Modeling morphology of social network cascades. arXiv:1302.2376, 2013.Google Scholar
- S. Solorio-Fernández, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad. Hybrid feature selection method for supervised classification based on laplacian score ranking. In Proceedings of the 2Nd Mexican Conference on Pattern Recognition: Advances in Pattern Recognition, MCPR'10, pages 260--269, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
- E. Stattner and N. Vidot. Social network analysis in epidemiology: Current trends and perspectives. International Conference on Research Challenges in Information Science, pages 1--11, May 2011.Google ScholarCross Ref
- E. Sun, I. Rosenn, C. A. Marlow, and T. M. Lento. Gesundheit! modeling contagion through facebook news feed. In 3rd International Conference on Weblogs and Social Media, pages 146--153, 2009.Google Scholar
- O. Tsur and A. Rappoport. What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities. Proceedings of the fifth ACM international conference on Web search and data mining, pages 643--652, 2012. Google ScholarDigital Library
- Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsosy. Epidemic spreading in real networks: An eigenvalue viewpoint. Proceedings of 22nd Symposium on Reliable Distributed Systems SRDS, October 2003.Google ScholarCross Ref
- J. Yang and S. Counts. Predicting the speed, scale, and range of information diffusion in twitter. Proceedings of International Conference on Weblogs and Social Media, 2010.Google Scholar
- Z. Yang, J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, and Z. Su. Understanding retweeting behaviors in social networks. ACM International Conference on Information and Knowledge Management (CIKM 2010), pages 1633--1636, October 2010. Google ScholarDigital Library
- Z. Zhou, R. Bandari, J. Kong, H. Qian, and V. Roychowdhury. Information resonance on twitter: watching iran. Social Media Analytics, SOMA, pages 123--131, 2010. Google ScholarDigital Library
Recommendations
Feature Driven and Point Process Approaches for Popularity Prediction
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementPredicting popularity, or the total volume of information outbreaks, is an important subproblem for understanding collective behavior in networks. Each of the two main types of recent approaches to the problem, feature-driven and generative models, have ...
Prediction of retweet cascade size over time
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementRetweet cascades play an essential role in information diffusion in Twitter. Popular tweets reflect the current trends in Twitter, while Twitter itself is one of the most important online media. Thus, understanding the reasons why a tweet becomes ...
What is Twitter, a social network or a news media?
WWW '10: Proceedings of the 19th international conference on World wide webTwitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal ...
Comments