Description-oriented community detection using exhaustive subgroup discovery
Introduction
While classic community detection, e.g., [17] for a survey, just identifies subgroups of nodes with a dense structure, lacking an interpretable description, this paper focuses on the task of description-oriented community detection. Using additional descriptive features of the nodes contained in the network, we approach the task of identifying communities as sets of nodes together with a description, i.e., a logical formula on the values of the nodes’ descriptive features. Such a community pattern then provides an intuitive description of the community, e.g., by an easily interpretable conjunction of attribute–value pairs. This is usually not achieved by classical community mining methods that consider the nodes of a network (e.g., denoting users in a social network) as mere strings or ids.
We present an algorithm for description-oriented community detection of the top-k communities (described by community patterns) with respect to a number of standard community evaluation functions. The method is based on an adapted subgroup discovery approach [10], [36], and also tackles typical problems that are not addressed by standard approaches for community detection such as pathological cases like small community sizes. We focus on interpretable patterns that can easily be incorporated into a practical application, for example, for recommendations in social bookmarking systems. It is important to note that we focus on static social graphs and do not take the dynamics into account since we aim to characterize a given community (allocation) for a given fixed interaction structure. Also, since in practice the entities in a network tend to belong to a number of different communities, the presented method naturally captures overlapping community allocations. Moreover, in contrast to global approaches, we focus on the discovery of local communities. According to the idea of local pattern mining, e.g., [20], we do not try to find a complete (global) partitioning of the network. Instead, we consider a set of local, potentially overlapping communities. These should be as exceptional as possible with respect to a given community quality measure.
We demonstrate our approach on several social media applications such as social networking and social bookmarking systems that provide interaction networks like explicit friendship relations between users. However, the presented approach is not limited to such systems and can be applied to any kind of graph-structured data for which additional descriptive features (node labels) are available, e.g., certain activity in telephone networks or interactions in face-to-face contacts [6] that also utilize tags or topic descriptions for the contained relations.
As an accompanying example, throughout the paper we use the friendship graph of the social bookmarking system BibSonomy1 [15]. In BibSonomy, users can declare their friendship toward other users, thus, creating a directed graph with users as nodes. At the same time, each user collects and tags resources like publications and web pages. Thus, a user’s set of tags can be considered as a description of that user’s interests. The community mining task here is to find user groups, where users are well connected by their friendship links and share a common interest in one or more features (tags).
Overall, the contribution of this paper can be summarized as follows:
- 1.
We first introduce description-oriented community detection and present the COMODO algorithm for obtaining the k-best community patterns using a given community evaluation measure. COMODO is a branch-and-bound algorithm based on an exhaustive subgroup discovery approach.
- 2.
For fast description-oriented community detection using COMODO, we propose optimistic estimates [25], [62] which are efficient to compute. We consider a number of standard community quality functions: The segregation index [19], the inverse average ODF (out degree fraction) [38], and the modularity [49]. We discuss the different measures for unweighted and weighted graphs, and extend the optimistic estimates accordingly.
- 3.
We evaluate the presented approach using five data sets from three real-world social applications, i.e., from the social bookmarking systems BibSonomy and delicious,2 and from the social media platform last.fm.3
The remainder of the paper is structured as follows: Section 2 summarizes basics of subgroup discovery, and provides general notions of graphs and community mining measures. Next, Section 3 introduces the proposed approach for description-oriented community detection and presents a number of optimistic estimates for standard community evaluation functions. After that, Section 4 discusses related work. For demonstrating the effectiveness and validity of the presented approach, Section 5 provides experiments using five data sets and discusses their results in the context of the three real-world applications. Finally, Section 6 concludes the paper with a summary and directions for future research.
Section snippets
Preliminaries
In the following, we briefly introduce basic notions with respect to pattern mining using subgroup discovery, graphs, and community quality measures.
Description-oriented community detection
Many community mining algorithms collect sets of nodes denoting the individual communities focusing on structural aspects of the graph; typically there is no simple, and easily interpretable description. In our example, a user community would be represented merely as a set of names (strings) or ids. To bridge this gap, we combine community detection and subgroup discovery in a unified approach for mining community patterns. This tackles one of the basic problem of community detection in many
Related work
Community detection methods can be classified according to several dimensions. We distinguish between methods that detect disjoint communities, i.e., where actors in a network can only belong to exactly one community, and those that allow overlapping communities, where actors can belong to multiple communities at the same time. Furthermore, we distinguish between methods that work on extended (attributed) graphs, e.g., with descriptive information about the nodes, and methods that work on the
Experiments
In the following, we first describe the data sets, before we present the conducted experiments and discuss the results. We focus on evaluating the efficiency of the presented pruning approach considering the search steps of the COMODO algorithm. Furthermore, we discuss properties of the discovered communities in order to assess their validity.
Conclusions
In this paper, we have presented an approach for description-oriented community detection using exhaustive subgroup discovery. We presented the COMODO algorithm for the discovery of community patterns. Furthermore, we proposed suitable optimistic estimates for a range of standard community quality functions; the optimistic estimates are efficient to compute and enable an effective approach. Our proposed method ensures that the top-k communities (representable by a given set of describing
Acknowledgements
This work has been partially supported by the VENUS research cluster at the Interdisciplinary Research Center for Information System Design (ITeG) at Kassel University, and by the Commune project funded by the Hertie Foundation.
References (66)
Community detection in graphs
Phys. Rep.
(2010)Graph clustering
Comput. Sci. Rev.
(2007)- et al.
Identifying social communities by frequent pattern mining
Knowledge-Intensive Subgroup Mining – Techniques for Automatic and Interactive Discovery
Data mining on social interaction networks
J. Data Min. Digital Humanities
(2014)Analyzing and grounding social interaction in online and offline networks
Subgroup discovery – advanced review
WIREs: Data Min. Knowl. Discov.
(2015)- et al.
Face-to-face contacts at a conference: dynamics of communities and roles
- et al.
Fast subgroup discovery for continuous target concepts
- M. Atzmueller, F. Lemmerich, B. Krause, A. Hotho, Who are the Spammers? Understandable local patterns for concept...