research-article

Towards Feature Selection for Cascade Growth Prediction on Twitter

Authors:
Sarah Elsharkawy

Research Department, ITWorx, Egypt

Research Department, ITWorx, Egypt
View Profile

,
Ghada Hassan

Faculty of Computer and Information Sciences, The British University in Egypt and AinShams University

Faculty of Computer and Information Sciences, The British University in Egypt and AinShams University
View Profile

,
Tarek Nabhan

ITWorx, Egypt

ITWorx, Egypt
View Profile

,
Mohamed Roushdy

Faculty of Computer and Information Sciences, AinShams University, Egypt

Faculty of Computer and Information Sciences, AinShams University, Egypt
View Profile

INFOS '16: Proceedings of the 10th International Conference on Informatics and SystemsMay 2016Pages 166–172https://doi.org/10.1145/2908446.2908463

Published:09 May 2016Publication History

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems

Pages 166–172

ABSTRACT

On online social networks such as Twitter, retweeting allows users to share a variety of content to their own followers. As tweets are retweeted from user to user, large cascades of tweets propagation are formed. During the past decade, social networks have been used for political rallying, civil society campaigns, and marketing promotions. The growth of cascades over time signal the popularity or lack thereof of the subject matter. In this work, we pose the question of whether the same feature set can be used for cascade growth prediction of any dataset on Twitter. We find evidence that features governing the cascade growth vary from one dataset to another. We first devise a definition of structural and temporal growth. Then, we propose an approach to select the best of these features based on the dataset for better accuracy results. We examine two types of growth prediction: structural and temporal. We use both Random Forest and Multilayer Perceptron as our models for growth prediction on data sets of two political campaigns in Egypt. The campaigns were concurrent in time, and were rallying for opposite causes, with a noticeable difference in societal popularity. We present and discuss the results of the most discriminating features in predicting cascades' growth of the two data sets, and provide evidence that the preselection of features improved the accuracy of the prediction task on the two data sets studied.

References

N. T. Bailey. The Mathematical Theory of Infectious Diseases and its applications. Griffin, London, 1975.Google Scholar
E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone's an influencer: quantifying influence on twitterr. Proceedings of the fourth ACM international conference on Web search and data mining, pages 65--74, 2011. Google ScholarDigital Library
J. Berger and K. L. Milkman. What makes online content viral? Journal of Marketing Research, 49(2):192--205, 2012.Google ScholarCross Ref
M. Cha, H. Haddadiy, F. Benevenutoz, and K. P. Gummadi. Measuring user influence in twitter: The million follower fallacy. International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.Google Scholar
J. Cheng, L. Adamic, A. Dow, J. Kleinberg, and J. Leskovec. Can cascades be predicted? International World Wide Web Conference (WWW'14), 2014. Google ScholarDigital Library
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2Nd Edition). Wiley-Interscience, 2000. Google ScholarDigital Library
T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett., 27(8):861--874, June 2006. Google ScholarDigital Library
W. Galuba, K. Aberer, D. Chakraborty, Z. Despotovic, and W. Kellerer. Outtweeting the twitterers predicting information cascades in microblogs. Proceedings of the 3rd conference on Online social network, page 3, 2010. Google ScholarDigital Library
D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information diffusion through blogspace. Proceedings of the 13th international conference on World Wide Web, pages 491--501, 2004. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarDigital Library
M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pages 359--366, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599--653, 2000. Google ScholarDigital Library
L. Hong, O. Dan, and B. D. Davison. Predicting popular messages in twitter. Proceedings of International Conference on World Wide Web, pages 57--58, 2011. Google ScholarDigital Library
C. Hui, Y. Tyshchuk, W. A. Wallace, M. Magdon-Ismail, and M. Goldberg. Information cascades in social media in response to a crisis: a preliminary model and a case study. Proceedings of the 21st international conference companion on World Wide Web, pages 653--656, 2012. Google ScholarDigital Library
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web, pages 591--600, 2010. Google ScholarDigital Library
J. Leskovec, M. Mcglohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. SDM, 2007.Google ScholarCross Ref
D. Liben-Nowell and J. Kleinberg. Tracing information flow on a global scale using internet chainletter data. Proceedings of the National Academy of Sciences of the United states of America, 2008.Google Scholar
H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowl. and Data Eng., 17(4):491--502, Apr. 2005. Google ScholarDigital Library
Z. Ma, A. Sun, and G. Cong. On predicting the popularity of newly emerging hashtags in twitter. Journal of the American Society for Information Science and Technology, 64(7):1399--1410, 2013.Google ScholarCross Ref
B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: Inferring social link creation times in twitter. In 20th International Conference on World Wide Web, pages 517--526, 2011. Google ScholarDigital Library
M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we rt? Proceedings of the First Workshop on Social Media Analytics, pages 71--79, 2010. Google ScholarDigital Library
M. Naseriparsa, A. Bidgoli, and T. Varaee. A hybrid feature selection method to improve performance of a group of classification algorithms. CoRR, abs/1403.2372, 2014.Google Scholar
B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. Proceedings of the International AAAI Conference on Weblogs and Social Media, 2010.Google Scholar
H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 27(8):1226--1238, Aug. 2005. Google ScholarDigital Library
S. Petrovic, M. Osborne, and V. Lavrenko. Rt to win! predicting message propagation in twitter. AAAI Publications, Fifth International AAAI Conference on Weblogs and Social Media, 2011.Google Scholar
M. Robnik-Šikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn., 53(1-2):23--69, Oct. 2003. Google ScholarDigital Library
D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In 20th International Conference on World Wide Web, pages 695--704, 2011. Google ScholarDigital Library
E. Sadikov and M. M. M. Martinez. Information propagation on twitter. CS322 Project Report, 2009.Google Scholar
E. Sadikov, M. Medina, J. Leskovec, and H. Garcia-Molina. Correcting for missing data in information cascades. The 4th international conference on web research and data mining, pages 55--64, February 2011. Google ScholarDigital Library
S. Scellato, C. Mascolo, M. Musolesi, and J. Crowcroft. Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades. Proceedings of the 20th international conference on World wide web, pages 457--466, 2011. Google ScholarDigital Library
M. Z. Shafiq and A. X. Liu. Modeling morphology of social network cascades. arXiv:1302.2376, 2013.Google Scholar
S. Solorio-Fernández, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad. Hybrid feature selection method for supervised classification based on laplacian score ranking. In Proceedings of the 2Nd Mexican Conference on Pattern Recognition: Advances in Pattern Recognition, MCPR'10, pages 260--269, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarDigital Library
E. Stattner and N. Vidot. Social network analysis in epidemiology: Current trends and perspectives. International Conference on Research Challenges in Information Science, pages 1--11, May 2011.Google ScholarCross Ref
E. Sun, I. Rosenn, C. A. Marlow, and T. M. Lento. Gesundheit! modeling contagion through facebook news feed. In 3rd International Conference on Weblogs and Social Media, pages 146--153, 2009.Google Scholar
O. Tsur and A. Rappoport. What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities. Proceedings of the fifth ACM international conference on Web search and data mining, pages 643--652, 2012. Google ScholarDigital Library
Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsosy. Epidemic spreading in real networks: An eigenvalue viewpoint. Proceedings of 22nd Symposium on Reliable Distributed Systems SRDS, October 2003.Google ScholarCross Ref
J. Yang and S. Counts. Predicting the speed, scale, and range of information diffusion in twitter. Proceedings of International Conference on Weblogs and Social Media, 2010.Google Scholar
Z. Yang, J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, and Z. Su. Understanding retweeting behaviors in social networks. ACM International Conference on Information and Knowledge Management (CIKM 2010), pages 1633--1636, October 2010. Google ScholarDigital Library
Z. Zhou, R. Bandari, J. Kong, H. Qian, and V. Roychowdhury. Information resonance on twitter: watching iran. Social Media Analytics, SOMA, pages 123--131, 2010. Google ScholarDigital Library

Recommendations

Feature Driven and Point Process Approaches for Popularity Prediction
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Predicting popularity, or the total volume of information outbreaks, is an important subproblem for understanding collective behavior in networks. Each of the two main types of recent approaches to the problem, feature-driven and generative models, have ...
Read More
Prediction of retweet cascade size over time
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Retweet cascades play an essential role in information diffusion in Twitter. Popular tweets reflect the current trends in Twitter, while Twitter itself is one of the most important online media. Thus, understanding the reasons why a tweet becomes ...
Read More
What is Twitter, a social network or a news media?
WWW '10: Proceedings of the 19th international conference on World wide web

Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems
May 2016
347 pages
ISBN:9781450340625
DOI:10.1145/2908446

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cascade Growth Prediction
Feature Selection
Graph Analysis
Information Cascades
Information Diffusion
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 120
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Feature Selection for Cascade Growth Prediction on Twitter

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems

ABSTRACT

References

Cited By

Recommendations

Feature Driven and Point Process Approaches for Popularity Prediction

Prediction of retweet cascade size over time

What is Twitter, a social network or a news media?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards Feature Selection for Cascade Growth Prediction on Twitter

INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems

ABSTRACT

References

Cited By

Recommendations

Feature Driven and Point Process Approaches for Popularity Prediction

Prediction of retweet cascade size over time

What is Twitter, a social network or a news media?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media