Abstract
Social news and content aggregation Web sites have become massive repositories of valuable knowledge on a diverse range of topics. Millions of Web-users are able to leverage these platforms to submit, view and discuss nearly anything. The users themselves exclusively curate the content with an intricate system of submissions, voting and discussion. Furthermore, the data on social news Web sites is extremely well organized by the user-base, which, like in Wikipedia, opens the door for opportunities to leverage this data for other purposes. In this paper, we study a popular social news Web site called Reddit. Our investigation looks at the dynamics of hierarchical discussion threads, and we ask three questions: (1) to what extent do discussion threads resemble a topical hierarchy? (2) Can discussion threads be used to enhance Web search? and (3) what variables are the best predictors for high scoring comments? We show interesting results for these questions on a very large snapshot several sub-communities of the Reddit Web site. Finally, we discuss the implications of these results and suggest ways by which social news Web sites can be used to perform other tasks.
Similar content being viewed by others
Notes
Zimmerman was indicted, and later acquitted, for murder in the shooting death of an unarmed, black teenager in Sanford, Florida, in Feb. of 2012 c.f. http://en.wikipedia.org/wiki/Shooting_of_Trayvon_Martin.
According to http://Alexa.com, accessed Sept 27, 2013.
http://www.reddit.com/reddits/, accessed on 7/24/2012.
http://reddit.com/about, accessed 11/19/2013.
Compared to best fits of Power law (a = 40.08, k = R 2 = 4.836) and 2-degree polynomial [p = (54310, 354700, 543800), R 2 = 0.9148].
Full discussion available at http://redd.it/18l9je/.
References
Adamic LA, Zhang J, Bakshy E, Ackerman MS (2008) Knowledge sharing and yahoo answers. In: WWW, ACM Press, p 665
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites. In: SIGKDD, ACM Press, p 850
Asur S, Huberman BA (2010) Predicting the future with social media. In: WI-IAT, IEEE Computer Society, pp 492–499
Bandari R, Asur S, Huberman BA (2012) The pulse of news in social media: forecasting popularity. In: ICWSM, AAAI Press, pp 26–33
Bernstein MS, Monroy-Hernández A, Harry D, André P, Panovich K, Vargas G (2011) 4chan and /b: an analysis of anonymity and ephemerality in a large online community. In: ICWSM, AAAI Press, pp 50–57
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Blei DM, Griffiths TL, Jordan MI (2010) The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J ACM 57(2):1–30
Bross J, Richly K, Kohnen M, Meinel C (2012) Identifying the top-dogs of the blogosphere. Soc Netw Anal Min 2(1):53–67
Chang J, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves : how humans interpret topic models. NIPS 31:1–9
Cong G, Wang L, Lin CY, Song YI, Sun Y (2008) Finding question-answer pairs from online forums. In: SIGIR, ACM Press, p 467
Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the World-Wide Web. Commun ACM 54(4):86
Duan H, Zhai C (2011) Exploiting thread structures to improve smoothing of language models for forum post retrieval. In: ECIR, Springer, pp 350–361
Fisher D, Smith M, Welser H (2006) You are who you talk to: detecting roles in usenet newsgroups. In: HICSS, IEEE Computer Society, pp 59b
Gilbert E (2013) Widespread underprovision on Reddit. In: CSCW, ACM Press, pp 803–808
Gilbert F, Simonetto P, Zaidi F, Jourdan F, Bourqui R (2011) Communities and hierarchical structures in dynamic social networks: analysis and visualization. Soc Netw Anal Min 1(2):83–95
Gómez V, Kaltenbrunner A, López V (2008) Statistical analysis of the social network and discussion threads in slashdot. In: WWW, ACM Press, p 645
Hong L, Yin D, Guo J, Davison BD (2011) Tracking trends. In: SIGKDD, ACM Press, p 484
Jarvelin K, Kekalainen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446, ACM Press
Kaltenbrunner A, Gómez V, Moghnieh A, Meza R, Blat J, López V (2008) Homogeneous temporal activity patterns in a large online communication space. Int J WWW/INTERNET 6(1):61–76
Kawamae N, Higashinaka R (2010) Trend detection model. In: WWW, ACM Press, p 1129
Kittur A, Kraut RE (2008) Harnessing the wisdom of crowds in wikipedia. In: CSCW, ACM Press, p 37
Lakkaraju H, McAuley J, Leskovec J (2013) What’s in a name? Understanding the Interplay between titles, content and communities in social media. In: ICWSM, AAAI Press, pp 311–320
Lampe C, Resnick P (2004) Slash(dot) and burn. In: SIGCHI, ACM Press, pp 543–550
Laniado D, Tasso R, Volkovich Y, Kaltenbrunner A (2011) When the Wikipedians talk: network and tree structure of Wikipedia discussion pages. In: ICWSM, pp 177–184
Lerman K, Galstyan A (2008) Analysis of social voting patterns on digg. In: WOSP, ACM Press, p 7
Lerman K (2007a) Social information processing in social news aggregation. IEEE Internet Comput: Special Issue Soc Search 11(6):1628
Lerman K (2007b) User participation in social media: Digg study. In: WI-IAT Workshop on Social Media Analysis, IEEE Computer Society, pp 255–258
Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007) Cascading behavior in large Blog graphs. In: SDM, SIAM
Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: SIGKDD, ACM Press, p 497
Mejova Y, Srinivasan P, Boynton B (2013) GOP primary season on twitter. In: WSDM, ACM Press, p 517
Mishne G, Glance N (2006) Leave a reply: an analysis of Weblog comments. In: WWE, IW3C2
Muchnik L, Aral S, Taylor S (2013) Social influence bias: a randomized experiment. Science 341(6146):647–651
Mukherjee A, Liu B (2012) Mining contentions from discussions and debates. In: SIGKDD, ACM Press, p 841
Paul SA, Hong L, Chi EH (2012) Who is authoritative? Understanding reputation mechanisms in Quora. In: Collective Intelligence, ARXIV
Mukherjee A, Liu B (2012) Some simple effective approxmiations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR, ACM Press, p 232–241
Schneider J, Passant A, Breslin JG (2011) Understanding and improving Wikipedia article discussion spaces. In: SAC, ACM Press, p 808
Seo J, Croft WB, Smith DA (2009) Online community search using thread structure. In: CIKM, ACM Press, p 1907
Szabo G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):80–88, ACM Press
Tsagkias M, Weerkamp W, de Rijke M (2009) Predicting the volume of comments on online news stories. In: CIKM, ACM Press, p 1765
Wang H, Wang C, Zhai C, Han J (2011) Learning online discussion structures by conditional random fields. In: SIGIR, ACM Press, p 435
Wang C, Ye M, Huberman BA (2012) From user comments to On-line Conversations. In: SIGKDD, ACM Press, p 244–252
Welser HT, Cosley D, Kossinets G, Lin A, Dokshin F, Gay G, Smith M (2011) Finding social roles in Wikipedia. In: iConference, ACM Press, p 122–129
Weninger T, Bisk Y, Han J (2012) Document-topic hierarchies from document graphs. In: CIKM, ACM Press, pp 635–644
Zaragoza H, Craswell N, Taylor M, Saria S, Robertson S (2004) Microsoft Cambridge at TREC-13: Web and HARD tracks. In: TREC, ACM Press
Zhu Y (2010) Measurement and analysis of an online content voting network. In: WWW, ACM Press, p 1039
Acknowledgments
This work is funded by the National Science Foundation Graduate Research Fellowship.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Weninger, T. An exploration of submissions and discussions in social news: mining collective intelligence of Reddit. Soc. Netw. Anal. Min. 4, 173 (2014). https://doi.org/10.1007/s13278-014-0173-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0173-9