Lifespan and propagation of information in On-line Social Networks: A case study based on Reddit

https://doi.org/10.1016/j.jnca.2015.06.006Get rights and content

Abstract

Since 1950, information flows have been in the center of scientific research. Up until the Internet penetration in the late 1990s, these information flow studies were based on traditional offline social networks. From the first Online Social Network studies, various observations of “offline” information flows, such as the two-step flow of communication and the importance of weak ties, were verified in several “online” studies, also indicating that information flows from one Online Social Network (OSN) to several others. Within that flow, information is shared with and reproduced, by users of each network. Furthermore, the original content is enhanced or weakened according to its topic, as well as the dynamic nature and exposure of each Online Social Networks (OSNs). In such an informational connected environment, each OSN is considered as a layer of information flows, which interacts with other layers. We examine information flows in several social networks, as well as their diffusion and lifespan, across these networks, based on user-generated content. Our results verify the information connection in various OSNs and provide a measurement of shared information lifetime in multiple OSNs.

Introduction

Information is constantly exchanged online among friends, acquaintances, family members, colleagues and even unknown individuals. The type of information varies and includes local or world news, general and scientific facts, quotes, personal preferences, etc. This broad information exchange would not be possible without a communication environment such as the Internet. Moreover, the creation of Online Social Networks (OSNs) and their adoption in our everyday lives, have led to the development of new information-sharing schemes where users can easily disseminate information quite fast.

The information that flows in OSNs, along with its characteristics, properties and impact, have been the subject of several previous studies (as discussed in Section 2), which were mainly focused on the following:

  • a.

    Virality: the tendency of information to be circulated rapidly and widely across different Web users.

  • b.

    Diffusion and propagation: how fast information is reproduced and spread in OSNs.

  • c.

    Dynamics: properties that constitute, sustain, or modify the topology of the OSNs based on the diffusion and propagation processes (hierarchy, network partition, clustering etc.).

  • d.

    Influence: the capacity of OSN power nodes (e.g. popular persons, news media, opinion makers) to be a compelling force on behavior.

However, nowadays, OSNs are densely connected to each other, through multiple information flows.

Considering every OSN as a layer (Fig. 1) and the information as links connecting each layer, we propose the concept of multilayer information flow. In this concept, information is spread from a source layer and propagates in multiple other layers. To evaluate our proposal, we decided to focus on Reddit and its content, mainly because this OSN is comprised of original content, along with relayed information, and offers a fairly liberal data access policy, with an open Application Programming Interface (API) and the required documentation for data scraping.

Reddit is a social news and entertainment site powered by user generated content. Registered users submit content through a descriptive link that may contain; an image, meme, video, question, Ask Me Anything (AMA) session, and the community can then vote and comment on that post. In correspondence with votes, users who have created a post or commented on one, gain or lose “karma”, a Reddit-oriented metric for user ranking. This metric is calculated as the sum of all the upvotes minus the sum of all the downvotes a user receives. Posts that acquire a high vote ratio (positive to negative) in a short time period after their submission are moved to the front page. It is apparent that Reddit community defines the popularity of the disseminated content and determines its “success” or “failure”.

Throughout posted content on Reddit, we focused on posts that link to an external domain and their traffic could be easily measured (e.g. number of views). Thus, content linking to Wikipedia articles or news sites, is not taken into consideration in this research. One of our initial observations was that the highest rated content was mostly from the ImgUr domain. ImgUr is one of the most popular online images hosting service in Reddit community.

As illustrated in Fig. 1, content on Reddit can be (amongst others) an image or a set of images hosted in ImgUr, or a video in YouTube. In ImgUr, content is usually created at the same time as the corresponding post in Reddit. In the case of YouTube, most posts in Reddit are linked to old videos. A short time after content creation and the subsequent increase in popularity within Reddit, users from different OSNs start mentioning that content, either by citing Reddit or the domain where the content is hosted (ImgUr or YouTube in our case).

In this work, we wanted to use famous and heavily-visited OSNs. In this context, Twitter provided us with the ability to fully observe the impact of a front-paged Reddit post. In contrast, since most content in Facebook and Google Plus are private, we only discovered a fraction of the total references, derived from search through public posts.

It is an easily observed fact that information is shared and spread among these social networks. A post in any of these networks impacts the others as well. But, what is the size of this impact? How long does it last? Is it dependent over different thematic categories (e.g. politics, sports, entertainment etc.)? These are some of the questions we intend to explore in this work. Information flows and their diffusion along with information virality, are the main aspects addressed towards the answer. Through multiple social network analysis, we aim to present the multi-layered flow of information across modern OSNs.

The remainder of the paper is organized as follows. In Section 2, we present some of the most important research initiatives on information virality and information diffusion along with their results in social networks. Section 3 describes the data mining methods and the dataset we used. In Section 4, we present the results derived from our analysis. Finally, Section 5 discusses evaluation issues of our study, while Section 6 concludes this work.

Section snippets

Related work

The topic of diffusion has been at the center of sociology interest for many years. Even before the emergence of OSNs, social ties and information flows have been studied in traditional real-life social networks.

The notion that information flows, from mass media to opinion leaders and later on to a wider population as final consumers, was firstly introduced during the middle of 1940s (Lazarsfeld et al., 1944). In their introduction, Lazarsfeld et al. found that, during a presidential election,

Methodology – dataset description

Our research is focused on Reddit, a social news and entertainment site. Reddit’s content is generated and ranked by users, based on positive/negative (up/down respectively) votes. Newly created posts with a high enough1 rank, reach the front page. The submitted content often links to an external domain and varies from simple news posts, political articles, Ask Me Anything sessions, to entertaining pictures. The view count on their original source is not always

Results

In this section, we present and analyze the results derived from our scraping procedure. Data is separated based on their topic, subreddit category and the hosting domain. So “new” and “rising” in every chart, denote the corresponding (subreddit) category, while ImgUr and YouTube define the hosting domain. Reddit posts include topics such as “AdviceAnimals”, “Aww”, “Eathpon”, “Funny”, “Gaming”, “Gifs”, “Movies”, “Music”, “Pics”, “TIL”, “Videos” and “WTF”, and all were discovered with our

Discussion

Our analysis verified many observations of previous researches mainly with respect to the micro- and macroeffects of information flows (Lazarsfeld et al., 1944, Katz and Lazarsfeld, 1970, Karnik et al., 2013). More specifically, a single post (microeffect) in the parent domain connected with a post in Reddit, starts to accumulate views (macro effect) up to a point where the information hops to OSNs (first in Twitter and then in Facebook) flowing through individuals and eventually the interest

Conclusion – future work

Information starts within a domain and hops to various other media within minutes. However, upon its propagation, information flows within the original domain does not stop. It merely slows down, after a small period of time. Entertainment content and positively emotive content are the most “viral”. While, persistence was only found in gaming posts, posts with movie content were the only ones that spread nearly simultaneously to every domain and OSN. We should consider a new perception of

References (32)

  • J.L. Iribarren et al.

    Affinity paths and information diffusion in social networks

    Soc Netw

    (2011)
  • A. Karnik et al.

    On the diffusion of messages in on-line social networks

    Perform Eval

    (2013)
  • D.T. Allsop et al.

    Word-of-mouth research: principles and applications

    J Advert Res

    (2007)
  • Bakshy E, Itamar R, Cameron M, Adamic L. The role of social networks in information diffusion. In: Proceedings of the...
  • Berger J, Milkman K. What makes online content viral? Wharton research paper; 2010....
  • F. Bonchi

    Influence propagation in social networks: a data mining perspective

    IEEE Intell Inform Bull

    (2011)
  • Goyal A, Bonchi F, Lakshmanan LVS. Learning influence probabilities in social networks. In: Proceedings of the 3rd ACM...
  • Granovetter et al.

    The strength of weak ties

    Am J Sociol

    (1973)
  • Guerini M, Strapparava C, Özbal G. Exploring text virality in social networks. In: Proceedings of ICWSM;...
  • Guerini M, Pepe A, Lepri B. Do linguistic style and readability of scientific abstracts affect their virality?...
  • Guille A, Hacid H. A predictive model for the temporal dynamics of information diffusion in online social networks. In:...
  • Hansen LK, Arvidsson A, Nielsen FÅ, Colleoni E, Etter M., Good friends, bad news-affect and virality in twitter, future...
  • Ienco D, Bonchi F, Castillo C., The meme ranking problem: maximizing microblogging virality, data mining workshops;...
  • S. Jurvetson, and T. Draper, Viral marketing: viral marketing phenomenon explained, DFJ Netw News,...
  • E. Katz et al.

    Personal influence, the part played by people in the flow of mass communications

    (1970)
  • Kwak H, Lee C, Park H, Moon S. What is twitter, a social network or a news media? Proceedings of the 19th international...
  • Cited by (31)

    • Fake news outbreak 2021: Can we stop the viral spread?

      2021, Journal of Network and Computer Applications
      Citation Excerpt :

      Social media is like a blank sheet of paper on which anything can be written (Yaraghi, 2019), and people easily become dependent on it as a channel for sharing information. This exactly is the reason why social media platforms (e.g. Twitter and Facebook) are highly scrutinized for the information shared on them (Haralabopoulos et al., 2015). These platforms have undertaken some efforts to combat the spread of fake news but have largely failed to minimize its effect.

    • Model of warning information diffusion on online social networks based on population dynamics

      2021, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      Gomez proposed that the infection times could be one of the important factors that influence the infecting process and developed an algorithm called NETINF based on it [4,5]. Haralabopoulos et al. looked into the information connection in various online social networks and provided a measurement of shared information lifetime in multiple networks [6]. Compared with diffusion mechanism of the information, current research paid more attention to predictive models for their larger scope of application, like Yang and Leskovec’s Linear Influence Model [7], Matsubara’s SpikeM model [8], Jiang’s evolutionary game theoretic framework [9,10], Sutton’s a negative binomial regression model [11], and some other models [12–14].

    • DDSE: A novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks

      2018, Journal of Network and Computer Applications
      Citation Excerpt :

      Generally speaking, a social network is a structure connecting people or organizations for communications. There are many kinds of social networks such as email networks, phone contact networks, online social networks (Haralabopoulos et al., 2015), mobile social networks (Lu et al., 2014) and collaboration networks of scientists (Kimura et al., 2006) etc. Connecting billions of people and generating tons of data every day (Cui et al., 2016), social networks are not just communication channels but also great platforms for news propagation, public services and especially commercial advertising (Bond et al., 2012, Contractor et al., 2014, Onnela et al., 2010),even used as forensics tools recently (Quick and Choo, 2017).

    • Clustering time-stamped data using multiple nonnegative matrices factorization

      2016, Knowledge-Based Systems
      Citation Excerpt :

      In order to track the change of the communities in network data, Du et al. [22] proposed a framework which formulates the problem of tracking temporal community strength as an optimization task by orthogonal non-negative matrix factorization and Kalyanam et al. [23] modeled the topic evolution by leveraging social context and community information using collective matrices factorization. Recently, many works [24–26] study information propagation of time-stamped data. Iribarren and Moro [24] tracked the step-by-step email propagation of an invariable viral marketing message and found that the spreading nodes activity level is relevant to their out-degrees and active off-springs and the possibility of a node to become a spreader grows with the depth of the node in the propagation path.

    • Crisis Assessment Oriented Influence Maximization in Social Networks

      2023, IEEE Transactions on Computational Social Systems
    View all citing articles on Scopus
    View full text