ABSTRACT
Networks of online news articles and blog posts are some of the most commonly used data sets in network science. As a result, they have become a vital piece of network analysis and are used for the evaluation of algorithms that work on large networks, or serve as examples in the analysis of information diffusion and propagation. Similarly, scientific citation networks are part of the bedrock upon which much of modern network analysis is built and have been studied for decades. In this paper, we show that the backbone inherent to networks of online news articles shares significant structural similarities to scientific citation networks once the noise of spurious links is stripped away. We present a data set of news articles that, while it is extremely sparse and lightweight, still contains information relevant to the propagation of information in mass media and is remarkably similar to scientific citation networks, thus opening the door to the use of established methodologies from scientometrics and bibliometrics in the analysis of online news propagation.
- M. Cha, J. Pérez, and H. Haddadi, "Flash floods and ripples: The spread of media content through the blogosphere," in ICWSM '09, 2009.Google Scholar
- S. A. Myers, C. Zhu, and J. Leskovec, "Information diffusion and external influence in networks," in KDD '12. ACM, 2012, pp. 33--41. Google ScholarDigital Library
- J. Yang and J. Leskovec, "Modeling information diffusion in implicit networks," in ICDM '10. IEEE, 2010, pp. 599--608. Google ScholarDigital Library
- M. Atkinson and E. Van der Goot, "Near real time information mining in multilingual news," in WWW '09. ACM, 2009, pp. 1153--1154. Google ScholarDigital Library
- B. E. Teitler, M. D. Lieberman, D. Panozzo, J. Sankaranarayanan, H. Samet, and J. Sperling, "Newsstand: A new view on news," in SIGSPATIAL '08. ACM, 2008, p. 18. Google ScholarDigital Library
- I. Flaounas, M. Turchi, O. Ali, N. Fyson, T. De Bie, N. Mosdell, J. Lewis, and N. Cristianini, "The structure of the EU mediasphere," PloS one, vol. 5, no. 12, p. e14243, 2010.Google ScholarCross Ref
- J. Leskovec, L. Backstrom, and J. Kleinberg, "Meme-tracking and the dynamics of the news cycle," in KDD '09. ACM, 2009, pp. 497--506. Google ScholarDigital Library
- E. Garfield, "Citation analysis as a tool in journal evaluation," Science, vol. 178, no. 4060, pp. 471--479, 1972.Google ScholarCross Ref
- J. E. Hirsch, "An index to quantify an individual's scientific research output," PNAS, vol. 102, no. 46, pp. 16 569--16 572, 2005.Google ScholarCross Ref
- F. Radicchi, S. Fortunato, and A. Vespignani, "Citation networks," in Models of Science Dynamics. Springer, 2012, pp. 233--257.Google Scholar
- A. B. Jaffe and M. Trajtenberg, Patents, Citations, and Innovations: A Window on the Knowledge Economy. MIT Press, 2002.Google Scholar
- J. H. Fowler and S. Jeon, "The authority of supreme court precedent," Social networks, vol. 30, no. 1, pp. 16--30, 2008.Google ScholarCross Ref
- A. Spitz and E.-Á. Horvát, "Measuring long-term impact based on network centrality: Unraveling cinematic citations," PloS one, vol. 9, no. 10, p. e108857, 2014.Google ScholarCross Ref
- M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, and A. C. König, "Blews: Using blogs to provide context for news articles." in ICWSM '08, 2008.Google Scholar
- D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a highly connected world. Cambridge University Press, 2010. Google ScholarCross Ref
- L. Lloyd, D. Kechagias, and S. Skiena, "Lydia: A system for large-scale news analysis," in SPIRE '05. Springer, 2005, pp. 161--166. Google ScholarDigital Library
- R. Albert and A.-L. Barabási, "Statistical mechanics of complex networks," Reviews of modern physics, vol. 74, no. 1, p. 47, 2002.Google ScholarCross Ref
- M. E. Newman, "Mixing patterns in networks," Physical Review E, vol. 67, no. 2, p. 026126, 2003.Google ScholarCross Ref
- J. G. Foster, D. V. Foster, P. Grassberger, and M. Paczuski, "Edge direction and the structure of networks," PNAS, vol. 107, no. 24, pp. 10 815--10 820, 2010.Google ScholarCross Ref
- S. N. Dorogovtsev and J. F. Mendes, "Evolution of networks with aging of sites," Phys Rev E, vol. 62, no. 2, p. 1842, 2000.Google ScholarCross Ref
- K. B. Hajra and P. Sen, "Modelling aging characteristics in citation networks," Physica A, vol. 368, no. 2, pp. 575--582, 2006.Google ScholarCross Ref
- Z.-X. Wu and P. Holme, "Modeling scientific-citation patterns and other triangle-rich acyclic networks," Phys Rev E, vol. 80, no. 3, p. 037101, 2009.Google ScholarCross Ref
- B. Bollobás and O. M. Riordan, "Mathematical results on scale-free random graphs," Handbook of graphs and networks: from the genome to the Internet, pp. 1--34, 2003.Google Scholar
- J. Leskovec, J. Kleinberg, and C. Faloutsos, "Graphs over time: densification laws, shrinking diameters and possible explanations," in KDD '05. ACM, 2005, pp. 177--187. Google ScholarDigital Library
- E. Mones, P. Pollner, and T. Vicsek, "Universal hierarchical behavior of citation networks," J. Stat. Mech. Theor. Exp., vol. 2014, no. 5, p. P05023, 2014.Google ScholarCross Ref
- F. Radicchi, S. Fortunato, and C. Castellano, "Universality of citation distributions: Toward an objective measure of scientific impact," PNAS, vol. 105, no. 45, pp. 17 268--17 272, 2008.Google ScholarCross Ref
- S. P. Borgatti, "Centrality and network flow," Social networks, vol. 27, no. 1, pp. 55--71, 2005.Google ScholarCross Ref
- Breaking the News: Extracting the Sparse Citation Network Backbone of Online News Articles
Recommendations
The impact of misconduct on the published medical and non-medical literature, and the news media
Better understanding of research and publishing misconduct can improve strategies to mitigate their occurrence. In this study, we examine various trends among 2,375 articles retracted due to misconduct in all scholarly fields. Proportions of articles ...
[Un]breaking News: Design Opportunities for Enhancing Collaboration in Scientific Media Production
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsContemporary scientific media production requires a complex socio-technical infrastructure we call the "Media Production Pipeline" (MPP). Media professionals engage with researchers along the MPP to disseminate science news to the lay public. However, ...
Identifying science in the news: An assessment of the precision and recall of Altmetric.com news mention data
AbstractThe company Altmetric is often used to collect mentions of research in online news stories, yet there have been concerns about the quality of this data. This study investigates these concerns. Using a manual content analysis of 400 news stories as ...
Comments