Elsevier

Information Sciences

Volume 424, January 2018, Pages 204-223
Information Sciences

Community extraction and visualization in social networks applied to Twitter

https://doi.org/10.1016/j.ins.2017.09.022Get rights and content

Highlights

  • We propose an approach for social media analysis applied to Twitter’s network.

  • Community detection (Tribase) and interactive visualization (NLCOMS) are provided.

  • Tribase is assessed on the LFR benchmark showing its effectiveness.

  • A real-world data of the ANR-Info-RSN project are considered.

  • The approach allows to visually reveal community structure and hidden properties.

Abstract

Nowadays, social network analysis attracts more interest from the scientific community. However, it becomes trickier to analyse the generated data by the social networks due to their complexity, which hides the underlying patterns. In this work we propose an approach for social media analysis, especially for Twitter’s network. Our approach relies on two complementary steps: (i) a community identification based on a new community detection algorithm called Tribase, and (ii) an interactive community visualization, which provides gradual knowledge acquisition using our visualization tool, called NLCOMS. In order to assess the proposed approach, we have tested it on real-world data of the ANR Info-RSN project. This project is related to information propagation and community detection in Twitter’s network, more precisely on a collection of tweets dealing with media articles. The results show that our approach allows us to visually reveal the community structure and the related characteristics.

Introduction

The use of social networks is exponentially growing in our society, having as a consequence a deep change in the manners that people react to events and interact with each other. Social network analysis [56], SNA, is the field studying these social behaviors. It has been increasingly popular in the last decade with the ubiquitous use of social networks. SNA relies on the use of networks to investigate social interactions [33]. In many domains, like in biology, computer science and economy, a set of interacting entities leads to complex systems [34] with hidden properties. These systems can be modelled as graphs, where the nodes and the edges respectively represent the entities and the relationships between them. For example, in sociology, they try to understand the friendship in a social blog, the phone calls between customers of a cellphone operator [4] or also the exchanged e-mails within an institution [51]. In some cases, a numerical value can be assigned to the binary edges. These values express either a strength or a shared quantity between two actors, called weighted graphs. For example, the weight in the e-mail network could be the number of e-mails exchanged between two employees. In this context, an efficient network analysis, should take into account the edge weight in the devised approach. In this paper, we use edge weighted graphs to model the ”retweet”, which is a well known relation on Twitter. It occurs when a Twitter user republishes an original tweet of another Twitter user. Therefore, the edge weight corresponds to the number of times where a retweet is observed between two Twitter users.

Furthermore, an important objective in social network analysis is to reveal some semantic aspects behind the network topology. This objective can be reached by identifying the highly intra-connected groups of nodes or communities which are in the opposite poorly inter-connected with each other, known as community detection problem [11]. This problem is very challenging and can be met in several domains. For example, it can be met in sociology [13] where the aim is to study the common characteristics of social groups and what makes peoples group together in a community. To detect such communities, the approach that we propose in this paper uses at its first step a community detection algorithm for weighted graphs based on a collection of triangles to build the skeleton of the community structure as starting point. Indeed, triangles are an important structure in social networks, which reflects the closeness of a community [40]. This closeness is expressed by the number of closed triads over the total number of triads, where a triad is a connected triplet of nodes. The closer to 1 is this value, the higher the probability for a given node to have connected neighbors (i.e., it belongs to a triangle). Next, the algorithm repeatedly compares the intra-community and inter-community weights between groups allowing dominant communities to increase.

Community detection is the preliminary step to grasp the underlying semantic and structural information in the network. Next comes the visualization issue. What is the most appropriate visualization for the detected communities and the hidden information? As examples, a node-link diagram [4] is used for community depiction whereas word cloud [57] is used to visualize nodes attributes. In this work, additionally to node-link representation, circle packing is used to visualize the detected communities. Also, synchronous and coordinated views help the expert user to build his/her own ideas about communities characteristics, like bar chart, partition layout and word cloud.

The remaining part of this paper is organized as follows. In Section 2, we summarize the related works to the context of this paper. In Section 3, we introduce our community detection algorithm for weighted networks, called Tribase and we compare it with five community detection algorithms identified in the literature. The community visualization aspect and our visualization tool NLCOMS are presented in Section 4. Section 5 discusses the case study used to assess the approach proposed in this paper. Finally, Section 6 concludes the paper and suggests further improvements of this work.

Section snippets

Community detection

In this section, we present a non-exhaustive list of community detection algorithms. As the number of existing algorithms is huge, we try to list a representative subset which covers the most known techniques. Globally speaking, two main methods are available to extract communities in graphs. They are presented in the following subsections.

Community extraction via Tribase

In this section, we present a new community detection algorithm, called Tribase. The main idea of Tribase is to select a collection of triangles as a skeleton of the communities structure. The motivations behind this are twofold. First, triangles play an important role in the communities structure and they enable to capture the overall community organization in the graph, especially in social networks [14]. As example, we can mention the friendship graph where there are more chances that two

Community visualization with NLCOMS

The second step of the approach proposed in this paper relies on an interactive visualization of the communities that provides gradual knowledge acquisition. Nevertheless, depict a collection of communities with the appropriate visualization is hard as much as the detection process, especially, if additional information to the community structure has to be visualized. Furthermore, we have to avoid misleading user interpretation resulting from inappropriate community visualization even if the

Application to Twitter’s networks

In order to assess the applicability of the proposed approach on real-world data, we consider the community detection in social networks, especially in Twitter. Twitter is a widely used social network allowing the registered users to publish messages on the Internet, called tweets. It also provides online networking by user following and friendship. As a consequence of large use, Twitter attracts increasing interest from the scientific and professional fields, like sociologist, online reporters

Conclusion

In this work, we propose an approach for social media analysis, especially for Twitter’s network. Our approach relies on two complementary steps: (i) a community identification based on a new community detection algorithm called Tribase, and (ii) an interactive community visualization, which provides gradual knowledge acquisition using our visualization tool, called NLCOMS. Tribase algorithm uses the triangles obtained by a feasible solution of the weighted MTP problem as a starting point for

Acknowledgment

This research has been supported by the Agence Nationale de la Recherche (ANR, France) during the Info-RSN Project (ANR-13-SOIN-0008). We would like to thank Prof. Arnaud Mercier and Prof. Nathalie Pignard-Cheynel for their feedbacks regarding the usability of NLCOMS as expert users. Finally, we would also like to thank the anonymous reviewers for the constructive comments and for their suggestions, which undoubtedly improve this publication.

References (57)

  • G. Csardi et al.

    The igraph software package for complex network research

    Int. J. Complex Systems

    (2006)
  • J. Ellson et al.

    Graphviz and dynagraph static and dynamic graph drawing tools

    Graph Drawing Software

    (2003)
  • S. Fortunato

    Community detection in graphs

    Phys. Rep.

    (2010)
  • S. Fortunato et al.

    Resolution limit in community detection

    Proc. Nat. Acad.f Sci.

    (2007)
  • A. Friggeri et al.

    Triangles to capture social cohesion

    CoRR

    (2011)
  • M. Ghoniem et al.

    A comparison of the readability of graphs using node-link and matrix-based representations.

  • M. Girvan et al.

    Community structure in social and biological networks

    PNAS

    (2002)
  • A. Gruzd, Netlytic: software for automated text and social network analysis, 2016...
  • N. Henry et al.

    Matrixexplorer: a dual-representation system to explore social networks.

    IEEE Trans. Vis. Comput. Graph.

    (2006)
  • N. Henry et al.

    Nodetrix: a hybrid visualization of social networks

    IEEE Trans. Vis. Comput. Graph

    (2007)
  • I. Herman et al.

    Graph visualization and navigation in information visualization: a survey

    IEEE Trans. Vis. Comput. Graph

    (2000)
  • C. Klymko et al.

    Using triangles to improve community detection in directed networks.

    CoRR

    (2014)
  • S.G. Kobourov

    Force-directed Drawing Algorithms.

  • A. Lancichinetti et al.

    Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities

    Phys. Rev. E

    (2009)
  • A. Lancichinetti et al.

    Community detection algorithms: a comparative analysis

    Phys. Rev. E

    (2009)
  • A. Lancichinetti et al.

    Limits of modularity maximization in community detection

    CoRR

    (2011)
  • A. Lancichinetti et al.

    Benchmark graphs for testing community detection algorithms

    Phys. Rev. E

    (2008)
  • Cited by (0)

    View full text