skip to main content
10.1145/2908446.2908463acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinfosConference Proceedingsconference-collections
research-article

Towards Feature Selection for Cascade Growth Prediction on Twitter

Authors Info & Claims
Published:09 May 2016Publication History

ABSTRACT

On online social networks such as Twitter, retweeting allows users to share a variety of content to their own followers. As tweets are retweeted from user to user, large cascades of tweets propagation are formed. During the past decade, social networks have been used for political rallying, civil society campaigns, and marketing promotions. The growth of cascades over time signal the popularity or lack thereof of the subject matter. In this work, we pose the question of whether the same feature set can be used for cascade growth prediction of any dataset on Twitter. We find evidence that features governing the cascade growth vary from one dataset to another. We first devise a definition of structural and temporal growth. Then, we propose an approach to select the best of these features based on the dataset for better accuracy results. We examine two types of growth prediction: structural and temporal. We use both Random Forest and Multilayer Perceptron as our models for growth prediction on data sets of two political campaigns in Egypt. The campaigns were concurrent in time, and were rallying for opposite causes, with a noticeable difference in societal popularity. We present and discuss the results of the most discriminating features in predicting cascades' growth of the two data sets, and provide evidence that the preselection of features improved the accuracy of the prediction task on the two data sets studied.

References

  1. N. T. Bailey. The Mathematical Theory of Infectious Diseases and its applications. Griffin, London, 1975.Google ScholarGoogle Scholar
  2. E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone's an influencer: quantifying influence on twitterr. Proceedings of the fourth ACM international conference on Web search and data mining, pages 65--74, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Berger and K. L. Milkman. What makes online content viral? Journal of Marketing Research, 49(2):192--205, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Cha, H. Haddadiy, F. Benevenutoz, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.Google ScholarGoogle Scholar
  5. J. Cheng, L. Adamic, A. Dow, J. Kleinberg, and J. Leskovec. Can cascades be predicted? International World Wide Web Conference (WWW'14), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2Nd Edition). Wiley-Interscience, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett., 27(8):861--874, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Galuba, K. Aberer, D. Chakraborty, Z. Despotovic, and W. Kellerer. Outtweeting the twitterers predicting information cascades in microblogs. Proceedings of the 3rd conference on Online social network, page 3, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pages 359--366, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599--653, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Hong, O. Dan, and B. D. Davison. Predicting popular messages in twitter. Proceedings of International Conference on World Wide Web, pages 57--58, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Hui, Y. Tyshchuk, W. A. Wallace, M. Magdon-Ismail, and M. Goldberg. Information cascades in social media in response to a crisis: a preliminary model and a case study. Proceedings of the 21st international conference companion on World Wide Web, pages 653--656, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web, pages 591--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Leskovec, M. Mcglohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. SDM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. Liben-Nowell and J. Kleinberg. Tracing information flow on a global scale using internet chainletter data. Proceedings of the National Academy of Sciences of the United states of America, 2008.Google ScholarGoogle Scholar
  18. H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowl. and Data Eng., 17(4):491--502, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Ma, A. Sun, and G. Cong. On predicting the popularity of newly emerging hashtags in twitter. Journal of the American Society for Information Science and Technology, 64(7):1399--1410, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  20. B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: Inferring social link creation times in twitter. In 20th International Conference on World Wide Web, pages 517--526, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we rt? Proceedings of the First Workshop on Social Media Analytics, pages 71--79, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Naseriparsa, A. Bidgoli, and T. Varaee. A hybrid feature selection method to improve performance of a group of classification algorithms. CoRR, abs/1403.2372, 2014.Google ScholarGoogle Scholar
  23. B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. Proceedings of the International AAAI Conference on Weblogs and Social Media, 2010.Google ScholarGoogle Scholar
  24. H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 27(8):1226--1238, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Petrovic, M. Osborne, and V. Lavrenko. Rt to win! predicting message propagation in twitter. AAAI Publications, Fifth International AAAI Conference on Weblogs and Social Media, 2011.Google ScholarGoogle Scholar
  26. M. Robnik-Šikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn., 53(1-2):23--69, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In 20th International Conference on World Wide Web, pages 695--704, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Sadikov and M. M. M. Martinez. Information propagation on twitter. CS322 Project Report, 2009.Google ScholarGoogle Scholar
  29. E. Sadikov, M. Medina, J. Leskovec, and H. Garcia-Molina. Correcting for missing data in information cascades. The 4th international conference on web research and data mining, pages 55--64, February 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Scellato, C. Mascolo, M. Musolesi, and J. Crowcroft. Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades. Proceedings of the 20th international conference on World wide web, pages 457--466, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Z. Shafiq and A. X. Liu. Modeling morphology of social network cascades. arXiv:1302.2376, 2013.Google ScholarGoogle Scholar
  32. S. Solorio-Fernández, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad. Hybrid feature selection method for supervised classification based on laplacian score ranking. In Proceedings of the 2Nd Mexican Conference on Pattern Recognition: Advances in Pattern Recognition, MCPR'10, pages 260--269, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Stattner and N. Vidot. Social network analysis in epidemiology: Current trends and perspectives. International Conference on Research Challenges in Information Science, pages 1--11, May 2011.Google ScholarGoogle ScholarCross RefCross Ref
  34. E. Sun, I. Rosenn, C. A. Marlow, and T. M. Lento. Gesundheit! modeling contagion through facebook news feed. In 3rd International Conference on Weblogs and Social Media, pages 146--153, 2009.Google ScholarGoogle Scholar
  35. O. Tsur and A. Rappoport. What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities. Proceedings of the fifth ACM international conference on Web search and data mining, pages 643--652, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsosy. Epidemic spreading in real networks: An eigenvalue viewpoint. Proceedings of 22nd Symposium on Reliable Distributed Systems SRDS, October 2003.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. Yang and S. Counts. Predicting the speed, scale, and range of information diffusion in twitter. Proceedings of International Conference on Weblogs and Social Media, 2010.Google ScholarGoogle Scholar
  38. Z. Yang, J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, and Z. Su. Understanding retweeting behaviors in social networks. ACM International Conference on Information and Knowledge Management (CIKM 2010), pages 1633--1636, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Z. Zhou, R. Bandari, J. Kong, H. Qian, and V. Roychowdhury. Information resonance on twitter: watching iran. Social Media Analytics, SOMA, pages 123--131, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems
    May 2016
    347 pages
    ISBN:9781450340625
    DOI:10.1145/2908446

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 May 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader