Skip to main content
Log in

Online discussion threads as conversation pools: predicting the growth of discussion threads on reddit

  • Manuscript
  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

This paper proposes a data-driven method that forecasts groups of topic-related, overlapping, online conversation trees. Our method is generative: given a group of original posts, it generates the resulting conversation threads with timing and authorship information. We demonstrate using two large datasets from Reddit that the microscopic properties of such groups of conversations can be accurately predicted when starting from the original posts, without knowledge of the intermediate reactions to such posts. We show that our solution significantly outperforms competitive baselines in terms of predicting the conversation structure and user engagement over time. Potential benefits of this solution include the evaluation of intervention strategies to limit disinformation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abdelzaher T, Han J, Hao Y, Jing A, Liu D, Liu S, Nguyen HH, Nicol DM, Shao H, Wang T et al (2020) Multiscale online media simulation with socialcube. Comput Math Organ Theory 26:145–174 (2020). https://doi.org/10.1007/s10588-019-09303-7

  • Aliapoulios M, Papasavva A, Ballard C, De Cristofaro E, Stringhini G, Zannettou S, Blackburn J (2021) The gospel according to q: understanding the qanon conspiracy from the perspective of canonical information. https://arXiv.org/210108750

  • Aragón P, Gómez V, García D, Kaltenbrunner A (2017a) Generative models of online discussion threads: state of the art and research challenges. J Internet Serv Appl 8(1):15

    Article  Google Scholar 

  • Aragón P, Gómez V, Kaltenbrunner A (2017b) To thread or not to thread: the impact of conversation threading on online discussion. In: Proceedings of the International AAAI Conference on Web and Social Media, vol 11, no 1

  • Bollenbacher J, Pacheco D, Hui PM, Ahn YY, Flammini A, Menczer F (2021) On the challenges of predicting microscopic dynamics of online conversations. Appl Netw Sci 6(1):1–21

    Article  Google Scholar 

  • Bourigault S, Lamprier S, Gallinari P (2016) Representation learning for information diffusion through social networks: an embedded cascade model. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining, ACM, pp 573–582

  • Chen L, Deng H (2020) Predicting user retweeting behavior in social networks with a novel ensemble learning approach. IEEE Access 8:148250–148263

    Article  Google Scholar 

  • Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on World wide web, ACM, pp 925–936

  • Cheng J, Adamic LA, Kleinberg JM, Leskovec J (2016) Do cascades recur? In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 671–681

  • Cheng J, Kleinberg J, Leskovec J, Liben-Nowell D, State B, Subbian K, Adamic L (2018) Do diffusion protocols govern cascade growth? In: Proceedings of the International AAAI Conference on Web and Social Media, vol 12, no 1

  • Chollet F et al (2015) Keras. https://keras.io

  • DARPA DARPA (2021) Computational simulation of online social behavior (socialsim). https://www.darpa.mil/program/computational-simulation-of-online-social-behavior

  • De Jong K (1990) Genetic-algorithm-based learning. In: Machine learning, pp 611–638. Morgan Kaufmann

  • DiResta R, Shaffer K, Ruppel B, Sullivan D, Matney R, Fox R, Albright J, Johnson B (2018) The tactics & tropes of the internet research agency. New Knowledge

  • Dutta S, Masud S, Chakrabarti S, Chakraborty T (2020) Deep exogenous and endogenous influence combination for social chatter intensity prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1999–2008

  • Fang H, Cheng H, Ostendorf M (2016) Learning latent local conversation modes for predicting comment endorsement in online discussions. In: Proceedings of The 4th International Workshop on Natural Language Processing for Social Media, Association for Computational Linguistics, Austin, TX, USA, pp 55–64. https://doi.org/10.18653/v1/W16-6209

  • Gao X, Cao Z, Li S, Yao B, Chen G, Tang S (2019) Taxonomy and evaluation for microblog popularity prediction. ACM Trans Knowl Discov Data (TKDD) 13(2):1–40

    Article  Google Scholar 

  • Garibay I, Oghaz TA, Yousefi N, Mutlu EC, Schiappa M, Scheinert S, Anagnostopoulos GC, Bouwens C, Fiore SM, Mantzaris A et al (2020) Deep agent: studying the dynamics of information spread and evolution in social networks. https://arXiv.org/200311611

  • Glenski M, Saldanha E, Volkova S (2019) Characterizing speed and scale of cryptocurrency discussion spread on reddit. In: The World Wide Web Conference, pp 560–570

  • Goel S, Anderson A, Hofman J, Watts DJ (2015) The structural virality of online diffusion. Manag Sci 62(1):180–196

    Google Scholar 

  • Gomez-Rodriguez M, Song L, Daneshmand H, Schölkopf B (2016) Estimating diffusion networks: recovery conditions, sample complexity & soft-thresholding algorithm. J Mach Learn Res 17(1):3092–3120

    Google Scholar 

  • Gómez V, Kappen HJ, Litvak N, Kaltenbrunner A (2013) A likelihood-based framework for the analysis of discussion threads. World Wide Web 16(5–6):645–675

    Article  Google Scholar 

  • He X, Song G, Chen W, Jiang Q (2012) Influence blocking maximization in social networks under the competitive linear threshold model. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SIAM, pp 463–474

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780

    Article  Google Scholar 

  • Horawalavithana S (2021) Mcas. https://github.com/SamTube405/MCAS

  • Hutto CJ, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: 8th international AAAI conference on weblogs and social media

  • Islam MR, Muthiah S, Adhikari B, Prakash BA, Ramakrishnan N (2018) Deepdiffuse: predicting the’who’and’when’in cascades. In: 2018 IEEE International Conference on Data Mining (ICDM), IEEE, pp 1055–1060

  • Jahanbakhsh F, Zhang AX, Berinsky AJ, Pennycook G, Rand DG, Karger DR (2021) Exploring lightweight interventions at posting time to reduce the sharing of misinformation on social media. In: Proceedings of the ACM on Human-Computer Interaction 5, no. CSCW: 1–42

  • Krishnan S, Butler P, Tandon R, Leskovec J, Ramakrishnan N (2016) Seeing the forest for the trees: new approaches to forecasting cascades. In: Proceedings of the 8th ACM conference on web science, pp 249–258

  • Krohn R, Weninger T (2019) Modelling online comment threads from their start. In: IEEE international conference on big data (Big Data), pp 820–829

  • Kumar R, Mahdian M, McGlohon M (2010) Dynamics of conversations. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 553–562

  • Li C, Ma J, Guo X, Mei Q (2017) Deepcas: an end-to-end predictor of information cascades. In: Proceedings of the 26th international conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 577–586

  • Liben-Nowell D, Kleinberg J (2008) Tracing information flow on a global scale using internet chain-letter data. Proc Natl Acad Sci 105(12):4633–4638

    Article  Google Scholar 

  • Ling C, Tong G, Chen M (2020) Nestpp: modeling thread dynamics in online discussion forums. In: Proceedings of the 31st ACM conference on hypertext and social media, pp 251–260

  • Lu W, Chen W, Lakshmanan LV (2015) From competition to complementarity: comparative influence diffusion and maximization. Proc VLDB Endowment 9(2):60–71

    Article  Google Scholar 

  • Lu Y, Yu L, Zhang T, Zang C, Cui P, Song C, Zhu W (2018) Collective human behavior in cascading system: discovery, modeling and applications. In: IEEE international conference on data mining (ICDM), IEEE, pp 297–306

  • Lumbreras A (2016) Automatic role detection in online forums. PhD thesis Université de Lyon

  • Manco G, Pirrò G, Ritacco E (2018) Predicting temporal activation patterns via recurrent neural networks. In: International symposium on methodologies for intelligent systems, Springer, pp 347–356

  • Medvedev AN, Delvenne JC, Lambiotte R (2018) Modelling structure and predicting dynamics of discussion threads in online boards. J Complex Netw 7(1):67–82

    Article  Google Scholar 

  • Medvedev AN, Lambiotte R, Delvenne JC (2019) The anatomy of reddit: an overview of academic research. In: Ghanbarnejad F, Saha Roy R, Karimi F, Delvenne JC, Mitra B (eds) Dynamics on and of complex networks III. Springer International Publishing, Cham, pp 183–204

    Chapter  Google Scholar 

  • Myers SA, Leskovec J (2012) Clash of the contagions: cooperation and competition in information diffusion. In: Data mining (ICDM), IEEE 12th International Conference on, IEEE, pp 539–548

  • Myers SA, Zhu C, Leskovec J (2012) Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 33–41

  • Qiu J, Tang J, Ma H, Dong Y, Wang K, Tang J (2018) Deepinf: social influence prediction with deep learning. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, ACM, pp 2110–2119

  • Singer P, Flöck F, Meinhart C, Zeitfogel E, Strohmaier M (2014) Evolution of reddit: from the front page of the internet to a self-referential community? In: Proceedings of the 23rd international conference on world wide web, ACM, pp 517–522

  • Starbird K, Arif A, Wilson T (2019) Disinformation as collaborative work: surfacing the participatory nature of strategic information operations. In: Proceedings of the ACM on Human-Computer Interaction, vol 3 (CSCW), pp 1–26. https://doi.org/10.1145/3359229

  • Tan C (2018) Tracing community genealogy: how new communities emerge from the old. In: 12th international AAAI conference on web and social media

  • Valera I, Gomez-Rodriguez M (2015) Modeling adoption and usage of competing products. In: Proceedings of the IEEE international conference on data mining (ICDM), IEEE Computer Society, Washington, DC, USA, ICDM ’15, pp 409–418. https://doi.org/10.1109/ICDM.2015.40

  • Wang C, Ye M, Huberman BA (2012) From user comments to on-line conversations. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 244–252

  • Wang J, Zheng VW, Liu Z, Chang KCC (2017, November) Topological recurrent neural network for diffusion prediction. In: 2017 IEEE International Conference on Data Mining (ICDM). IEEE, pp 475–484

  • Weng L, Flammini A, Vespignani A, Menczer F (2012) Competition among memes in a world with limited attention. Sci Rep 2:335

    Article  Google Scholar 

  • Xiao Y, Zhang L, Li Q, Liu L (2019) Mm-sis: model for multiple information spreading in multiplex network. Phys A: Statist Mech Appl 513:135–146

    Article  Google Scholar 

  • Yu L, Cui P, Wang F, Song C, Yang S (2015, November) From micro to macro: uncovering and predicting information cascading process with behavioral dynamics. In: 2015 IEEE International Conference on Data Mining. IEEE, pp 559–568

  • Zarezade A, Khodadadi A, Farajtabar M, Rabiee HR, Zha H (2017) Correlated cascades: compete or cooperate. In: Proceedings of the 31st AAAI conference on artificial intelligence, San Francisco, California, USA, pp 238–244. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14360

  • Zarocostas J (2020) How to fight an infodemic. Lancet 395(10225):676

    Article  Google Scholar 

  • Zayats V, Ostendorf M (2018) Conversation modeling on reddit using a graph-structured lstm. Trans Assoc Comput Linguist 6:121–132

    Article  Google Scholar 

  • Zhao Q, Erdogdu MA, He HY, Rajaraman A, Leskovec J (2015) Seismic: a self-exciting point process model for predicting tweet popularity. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1513–1522

Download references

Acknowledgements

This work is supported by the DARPA SocialSim Program and the Air Force Research Laboratory under contract FA8650-18-C-7825. The authors would like to thank Leidos for providing data.

Funding

This work is supported by the DARPA SocialSim Program and the Air Force Research Laboratory under contract FA8650-18-C-7825.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sameera Horawalavithana.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Horawalavithana, S., Choudhury, N., Skvoretz, J. et al. Online discussion threads as conversation pools: predicting the growth of discussion threads on reddit. Comput Math Organ Theory 28, 112–140 (2022). https://doi.org/10.1007/s10588-021-09340-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-021-09340-1

Keywords

Navigation