Community extraction and visualization in social networks applied to Twitter
Introduction
The use of social networks is exponentially growing in our society, having as a consequence a deep change in the manners that people react to events and interact with each other. Social network analysis [56], SNA, is the field studying these social behaviors. It has been increasingly popular in the last decade with the ubiquitous use of social networks. SNA relies on the use of networks to investigate social interactions [33]. In many domains, like in biology, computer science and economy, a set of interacting entities leads to complex systems [34] with hidden properties. These systems can be modelled as graphs, where the nodes and the edges respectively represent the entities and the relationships between them. For example, in sociology, they try to understand the friendship in a social blog, the phone calls between customers of a cellphone operator [4] or also the exchanged e-mails within an institution [51]. In some cases, a numerical value can be assigned to the binary edges. These values express either a strength or a shared quantity between two actors, called weighted graphs. For example, the weight in the e-mail network could be the number of e-mails exchanged between two employees. In this context, an efficient network analysis, should take into account the edge weight in the devised approach. In this paper, we use edge weighted graphs to model the ”retweet”, which is a well known relation on Twitter. It occurs when a Twitter user republishes an original tweet of another Twitter user. Therefore, the edge weight corresponds to the number of times where a retweet is observed between two Twitter users.
Furthermore, an important objective in social network analysis is to reveal some semantic aspects behind the network topology. This objective can be reached by identifying the highly intra-connected groups of nodes or communities which are in the opposite poorly inter-connected with each other, known as community detection problem [11]. This problem is very challenging and can be met in several domains. For example, it can be met in sociology [13] where the aim is to study the common characteristics of social groups and what makes peoples group together in a community. To detect such communities, the approach that we propose in this paper uses at its first step a community detection algorithm for weighted graphs based on a collection of triangles to build the skeleton of the community structure as starting point. Indeed, triangles are an important structure in social networks, which reflects the closeness of a community [40]. This closeness is expressed by the number of closed triads over the total number of triads, where a triad is a connected triplet of nodes. The closer to 1 is this value, the higher the probability for a given node to have connected neighbors (i.e., it belongs to a triangle). Next, the algorithm repeatedly compares the intra-community and inter-community weights between groups allowing dominant communities to increase.
Community detection is the preliminary step to grasp the underlying semantic and structural information in the network. Next comes the visualization issue. What is the most appropriate visualization for the detected communities and the hidden information? As examples, a node-link diagram [4] is used for community depiction whereas word cloud [57] is used to visualize nodes attributes. In this work, additionally to node-link representation, circle packing is used to visualize the detected communities. Also, synchronous and coordinated views help the expert user to build his/her own ideas about communities characteristics, like bar chart, partition layout and word cloud.
The remaining part of this paper is organized as follows. In Section 2, we summarize the related works to the context of this paper. In Section 3, we introduce our community detection algorithm for weighted networks, called Tribase and we compare it with five community detection algorithms identified in the literature. The community visualization aspect and our visualization tool NLCOMS are presented in Section 4. Section 5 discusses the case study used to assess the approach proposed in this paper. Finally, Section 6 concludes the paper and suggests further improvements of this work.
Section snippets
Community detection
In this section, we present a non-exhaustive list of community detection algorithms. As the number of existing algorithms is huge, we try to list a representative subset which covers the most known techniques. Globally speaking, two main methods are available to extract communities in graphs. They are presented in the following subsections.
Community extraction via Tribase
In this section, we present a new community detection algorithm, called Tribase. The main idea of Tribase is to select a collection of triangles as a skeleton of the communities structure. The motivations behind this are twofold. First, triangles play an important role in the communities structure and they enable to capture the overall community organization in the graph, especially in social networks [14]. As example, we can mention the friendship graph where there are more chances that two
Community visualization with NLCOMS
The second step of the approach proposed in this paper relies on an interactive visualization of the communities that provides gradual knowledge acquisition. Nevertheless, depict a collection of communities with the appropriate visualization is hard as much as the detection process, especially, if additional information to the community structure has to be visualized. Furthermore, we have to avoid misleading user interpretation resulting from inappropriate community visualization even if the
Application to Twitter’s networks
In order to assess the applicability of the proposed approach on real-world data, we consider the community detection in social networks, especially in Twitter. Twitter is a widely used social network allowing the registered users to publish messages on the Internet, called tweets. It also provides online networking by user following and friendship. As a consequence of large use, Twitter attracts increasing interest from the scientific and professional fields, like sociologist, online reporters
Conclusion
In this work, we propose an approach for social media analysis, especially for Twitter’s network. Our approach relies on two complementary steps: (i) a community identification based on a new community detection algorithm called Tribase, and (ii) an interactive community visualization, which provides gradual knowledge acquisition using our visualization tool, called NLCOMS. Tribase algorithm uses the triangles obtained by a feasible solution of the weighted MTP problem as a starting point for
Acknowledgment
This research has been supported by the Agence Nationale de la Recherche (ANR, France) during the Info-RSN Project (ANR-13-SOIN-0008). We would like to thank Prof. Arnaud Mercier and Prof. Nathalie Pignard-Cheynel for their feedbacks regarding the usability of NLCOMS as expert users. Finally, we would also like to thank the anonymous reviewers for the constructive comments and for their suggestions, which undoubtedly improve this publication.
References (57)
- et al.
Branch-and-bound algorithm for the maximum triangle packing problem
Comput. Ind. Eng.
(2015) - et al.
An improved randomized approximation algorithm for maximum triangle packing
Discrete Appl. Math.
(2009) The Development of Social Network Analysis: A Study in the Sociology of Science
(2004)- et al.
Community detection in networks with node attributes.
CoRR
(2014) - M. Bastian, S. Heymann, M. Jacomy, Gephi: An open source software for exploring and manipulating networks, 2009...
Sémiologie Graphique : Les Diagrammes, Les Réseaux, Les Cartes
(1967)- et al.
Fast unfolding of communities in large networks
J. Stat. Mech
(2008) - P. Bródka, T. Filipowski, P. Kazienko, An Introduction to Community Detection in Multi-layered Social Network, Springer...
- et al.
Meerkat: community mining with dynamic social networks
2010 IEEE International Conference on Data Mining Workshops
(2010) - et al.
Finding community structure in very large networks
Phys. Rev. E
(2004)