Skip to main content
Log in

An exploration of submissions and discussions in social news: mining collective intelligence of Reddit

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Social news and content aggregation Web sites have become massive repositories of valuable knowledge on a diverse range of topics. Millions of Web-users are able to leverage these platforms to submit, view and discuss nearly anything. The users themselves exclusively curate the content with an intricate system of submissions, voting and discussion. Furthermore, the data on social news Web sites is extremely well organized by the user-base, which, like in Wikipedia, opens the door for opportunities to leverage this data for other purposes. In this paper, we study a popular social news Web site called Reddit. Our investigation looks at the dynamics of hierarchical discussion threads, and we ask three questions: (1) to what extent do discussion threads resemble a topical hierarchy? (2) Can discussion threads be used to enhance Web search? and (3) what variables are the best predictors for high scoring comments? We show interesting results for these questions on a very large snapshot several sub-communities of the Reddit Web site. Finally, we discuss the implications of these results and suggest ways by which social news Web sites can be used to perform other tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. Zimmerman was indicted, and later acquitted, for murder in the shooting death of an unarmed, black teenager in Sanford, Florida, in Feb. of 2012 c.f. http://en.wikipedia.org/wiki/Shooting_of_Trayvon_Martin.

  2. According to http://Alexa.com, accessed Sept 27, 2013.

  3. http://www.reddit.com/wiki/faq.

  4. http://www.reddit.com/reddits/, accessed on 7/24/2012.

  5. http://blog.reddit.com/.

  6. http://reddit.com/about, accessed 11/19/2013.

  7. Compared to best fits of Power law (a = 40.08, k =  R 2 = 4.836) and 2-degree polynomial [p = (54310, 354700, 543800), R 2 = 0.9148].

  8. Full discussion available at http://redd.it/18l9je/.

References

  • Adamic LA, Zhang J, Bakshy E, Ackerman MS (2008) Knowledge sharing and yahoo answers. In: WWW, ACM Press, p 665

  • Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites. In: SIGKDD, ACM Press, p 850

  • Asur S, Huberman BA (2010) Predicting the future with social media. In: WI-IAT, IEEE Computer Society, pp 492–499

  • Bandari R, Asur S, Huberman BA (2012) The pulse of news in social media: forecasting popularity. In: ICWSM, AAAI Press, pp 26–33

  • Bernstein MS, Monroy-Hernández A, Harry D, André P, Panovich K, Vargas G (2011) 4chan and /b: an analysis of anonymity and ephemerality in a large online community. In: ICWSM, AAAI Press, pp 50–57

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Blei DM, Griffiths TL, Jordan MI (2010) The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J ACM 57(2):1–30

    Article  MathSciNet  Google Scholar 

  • Bross J, Richly K, Kohnen M, Meinel C (2012) Identifying the top-dogs of the blogosphere. Soc Netw Anal Min 2(1):53–67

    Article  Google Scholar 

  • Chang J, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves : how humans interpret topic models. NIPS 31:1–9

    Google Scholar 

  • Cong G, Wang L, Lin CY, Song YI, Sun Y (2008) Finding question-answer pairs from online forums. In: SIGIR, ACM Press, p 467

  • Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the World-Wide Web. Commun ACM 54(4):86

    Article  Google Scholar 

  • Duan H, Zhai C (2011) Exploiting thread structures to improve smoothing of language models for forum post retrieval. In: ECIR, Springer,  pp 350–361

  • Fisher D, Smith M, Welser H (2006) You are who you talk to: detecting roles in usenet newsgroups. In: HICSS, IEEE Computer Society, pp 59b

  • Gilbert E (2013) Widespread underprovision on Reddit. In: CSCW, ACM Press, pp 803–808

  • Gilbert F, Simonetto P, Zaidi F, Jourdan F, Bourqui R (2011) Communities and hierarchical structures in dynamic social networks: analysis and visualization. Soc Netw Anal Min 1(2):83–95

    Article  Google Scholar 

  • Gómez V, Kaltenbrunner A, López V (2008) Statistical analysis of the social network and discussion threads in slashdot. In: WWW, ACM Press, p 645

  • Hong L, Yin D, Guo J, Davison BD (2011) Tracking trends. In: SIGKDD, ACM Press, p 484

  • Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446, ACM Press

    Google Scholar 

  • Kaltenbrunner A, Gómez V, Moghnieh A, Meza R, Blat J, López V (2008) Homogeneous temporal activity patterns in a large online communication space. Int J WWW/INTERNET 6(1):61–76

    Google Scholar 

  • Kawamae N, Higashinaka R (2010) Trend detection model. In: WWW, ACM Press, p 1129

  • Kittur A, Kraut RE (2008) Harnessing the wisdom of crowds in wikipedia. In: CSCW, ACM Press, p 37

  • Lakkaraju H, McAuley J, Leskovec J (2013) What’s in a name? Understanding the Interplay between titles, content and communities in social media. In: ICWSM, AAAI Press, pp 311–320

  • Lampe C, Resnick P (2004) Slash(dot) and burn. In: SIGCHI, ACM Press, pp 543–550

  • Laniado D, Tasso R, Volkovich Y, Kaltenbrunner A (2011) When the Wikipedians talk: network and tree structure of Wikipedia discussion pages. In: ICWSM, pp 177–184

  • Lerman K, Galstyan A (2008) Analysis of social voting patterns on digg. In: WOSP, ACM Press, p 7

  • Lerman K (2007a) Social information processing in social news aggregation. IEEE Internet Comput: Special Issue Soc Search 11(6):1628

    Article  MathSciNet  Google Scholar 

  • Lerman K (2007b) User participation in social media: Digg study. In: WI-IAT Workshop on Social Media Analysis, IEEE Computer Society, pp 255–258

  • Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007) Cascading behavior in large Blog graphs. In: SDM, SIAM

  • Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: SIGKDD, ACM Press, p 497

  • Mejova Y, Srinivasan P, Boynton B (2013) GOP primary season on twitter. In: WSDM, ACM Press, p 517

  • Mishne G, Glance N (2006) Leave a reply: an analysis of Weblog comments. In: WWE, IW3C2

  • Muchnik L, Aral S, Taylor S (2013) Social influence bias: a randomized experiment. Science 341(6146):647–651

    Article  Google Scholar 

  • Mukherjee A, Liu B (2012) Mining contentions from discussions and debates. In: SIGKDD, ACM Press, p 841

  • Paul SA, Hong L, Chi EH (2012) Who is authoritative? Understanding reputation mechanisms in Quora. In: Collective Intelligence, ARXIV

  • Mukherjee A, Liu B (2012) Some simple effective approxmiations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR, ACM Press, p 232–241

  • Schneider J, Passant A, Breslin JG (2011) Understanding and improving Wikipedia article discussion spaces. In: SAC, ACM Press, p 808

  • Seo J, Croft WB, Smith DA (2009) Online community search using thread structure. In: CIKM, ACM Press, p 1907

  • Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):80–88, ACM Press

    Google Scholar 

  • Tsagkias M, Weerkamp W, de Rijke M (2009) Predicting the volume of comments on online news stories. In: CIKM, ACM Press, p 1765

  • Wang H, Wang C, Zhai C, Han J (2011) Learning online discussion structures by conditional random fields. In: SIGIR, ACM Press, p 435

  • Wang C, Ye M, Huberman BA (2012) From user comments to On-line Conversations. In: SIGKDD, ACM Press, p 244–252

  • Welser HT, Cosley D, Kossinets G, Lin A, Dokshin F, Gay G, Smith M (2011) Finding social roles in Wikipedia. In: iConference, ACM Press, p 122–129

  • Weninger T, Bisk Y, Han J (2012) Document-topic hierarchies from document graphs. In: CIKM, ACM Press, pp 635–644

  • Zaragoza H, Craswell N, Taylor M, Saria S, Robertson S (2004) Microsoft Cambridge at TREC-13: Web and HARD tracks. In: TREC, ACM Press

  • Zhu Y (2010) Measurement and analysis of an online content voting network. In: WWW, ACM Press, p 1039

Download references

Acknowledgments

This work is funded by the National Science Foundation Graduate Research Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Weninger.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weninger, T. An exploration of submissions and discussions in social news: mining collective intelligence of Reddit. Soc. Netw. Anal. Min. 4, 173 (2014). https://doi.org/10.1007/s13278-014-0173-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-014-0173-9

keywords

Navigation