Elsevier

Information Sciences

Volume 329, 1 February 2016, Pages 965-984
Information Sciences

Description-oriented community detection using exhaustive subgroup discovery

https://doi.org/10.1016/j.ins.2015.05.008Get rights and content

Abstract

Communities can intuitively be defined as subsets of nodes of a graph with a dense structure in the corresponding subgraph. However, for mining such communities usually only structural aspects are taken into account. Typically, no concise nor easily interpretable community description is provided.

For tackling this issue, this paper focuses on description-oriented community detection using subgroup discovery. In order to provide both structurally valid and interpretable communities we utilize the graph structure as well as additional descriptive features of the graph’s nodes. A descriptive community pattern built upon these features then describes and identifies a community, i.e., a set of nodes, and vice versa. Essentially, we mine patterns in the “description space” characterizing interesting sets of nodes (i.e., subgroups) in the “graph space”; the interestingness of a community is evaluated by a selectable quality measure.

We aim at identifying communities according to standard community quality measures, while providing characteristic descriptions of these communities at the same time. For this task, we propose several optimistic estimates of standard community quality functions to be used for efficient pruning of the search space in an exhaustive branch-and-bound algorithm. We demonstrate our approach in an evaluation using five real-world data sets, obtained from three different social media applications.

Introduction

While classic community detection, e.g., [17] for a survey, just identifies subgroups of nodes with a dense structure, lacking an interpretable description, this paper focuses on the task of description-oriented community detection. Using additional descriptive features of the nodes contained in the network, we approach the task of identifying communities as sets of nodes together with a description, i.e., a logical formula on the values of the nodes’ descriptive features. Such a community pattern then provides an intuitive description of the community, e.g., by an easily interpretable conjunction of attribute–value pairs. This is usually not achieved by classical community mining methods that consider the nodes of a network (e.g., denoting users in a social network) as mere strings or ids.

We present an algorithm for description-oriented community detection of the top-k communities (described by community patterns) with respect to a number of standard community evaluation functions. The method is based on an adapted subgroup discovery approach [10], [36], and also tackles typical problems that are not addressed by standard approaches for community detection such as pathological cases like small community sizes. We focus on interpretable patterns that can easily be incorporated into a practical application, for example, for recommendations in social bookmarking systems. It is important to note that we focus on static social graphs and do not take the dynamics into account since we aim to characterize a given community (allocation) for a given fixed interaction structure. Also, since in practice the entities in a network tend to belong to a number of different communities, the presented method naturally captures overlapping community allocations. Moreover, in contrast to global approaches, we focus on the discovery of local communities. According to the idea of local pattern mining, e.g., [20], we do not try to find a complete (global) partitioning of the network. Instead, we consider a set of local, potentially overlapping communities. These should be as exceptional as possible with respect to a given community quality measure.

We demonstrate our approach on several social media applications such as social networking and social bookmarking systems that provide interaction networks like explicit friendship relations between users. However, the presented approach is not limited to such systems and can be applied to any kind of graph-structured data for which additional descriptive features (node labels) are available, e.g., certain activity in telephone networks or interactions in face-to-face contacts [6] that also utilize tags or topic descriptions for the contained relations.

As an accompanying example, throughout the paper we use the friendship graph of the social bookmarking system BibSonomy1 [15]. In BibSonomy, users can declare their friendship toward other users, thus, creating a directed graph with users as nodes. At the same time, each user collects and tags resources like publications and web pages. Thus, a user’s set of tags can be considered as a description of that user’s interests. The community mining task here is to find user groups, where users are well connected by their friendship links and share a common interest in one or more features (tags).

Overall, the contribution of this paper can be summarized as follows:

  • 1.

    We first introduce description-oriented community detection and present the COMODO algorithm for obtaining the k-best community patterns using a given community evaluation measure. COMODO is a branch-and-bound algorithm based on an exhaustive subgroup discovery approach.

  • 2.

    For fast description-oriented community detection using COMODO, we propose optimistic estimates [25], [62] which are efficient to compute. We consider a number of standard community quality functions: The segregation index [19], the inverse average ODF (out degree fraction) [38], and the modularity [49]. We discuss the different measures for unweighted and weighted graphs, and extend the optimistic estimates accordingly.

  • 3.

    We evaluate the presented approach using five data sets from three real-world social applications, i.e., from the social bookmarking systems BibSonomy and delicious,2 and from the social media platform last.fm.3

The remainder of the paper is structured as follows: Section 2 summarizes basics of subgroup discovery, and provides general notions of graphs and community mining measures. Next, Section 3 introduces the proposed approach for description-oriented community detection and presents a number of optimistic estimates for standard community evaluation functions. After that, Section 4 discusses related work. For demonstrating the effectiveness and validity of the presented approach, Section 5 provides experiments using five data sets and discusses their results in the context of the three real-world applications. Finally, Section 6 concludes the paper with a summary and directions for future research.

Section snippets

Preliminaries

In the following, we briefly introduce basic notions with respect to pattern mining using subgroup discovery, graphs, and community quality measures.

Description-oriented community detection

Many community mining algorithms collect sets of nodes denoting the individual communities focusing on structural aspects of the graph; typically there is no simple, and easily interpretable description. In our example, a user community would be represented merely as a set of names (strings) or ids. To bridge this gap, we combine community detection and subgroup discovery in a unified approach for mining community patterns. This tackles one of the basic problem of community detection in many

Related work

Community detection methods can be classified according to several dimensions. We distinguish between methods that detect disjoint communities, i.e., where actors in a network can only belong to exactly one community, and those that allow overlapping communities, where actors can belong to multiple communities at the same time. Furthermore, we distinguish between methods that work on extended (attributed) graphs, e.g., with descriptive information about the nodes, and methods that work on the

Experiments

In the following, we first describe the data sets, before we present the conducted experiments and discuss the results. We focus on evaluating the efficiency of the presented pruning approach considering the search steps of the COMODO algorithm. Furthermore, we discuss properties of the discovered communities in order to assess their validity.

Conclusions

In this paper, we have presented an approach for description-oriented community detection using exhaustive subgroup discovery. We presented the COMODO algorithm for the discovery of community patterns. Furthermore, we proposed suitable optimistic estimates for a range of standard community quality functions; the optimistic estimates are efficient to compute and enable an effective approach. Our proposed method ensures that the top-k communities (representable by a given set of describing

Acknowledgements

This work has been partially supported by the VENUS research cluster at the Interdisciplinary Research Center for Information System Design (ITeG) at Kassel University, and by the Commune project funded by the Hertie Foundation.

References (66)

  • S. Fortunato

    Community detection in graphs

    Phys. Rep.

    (2010)
  • S.E. Schaeffer

    Graph clustering

    Comput. Sci. Rev.

    (2007)
  • M. Adnan et al.

    Identifying social communities by frequent pattern mining

  • M. Atzmueller

    Knowledge-Intensive Subgroup Mining – Techniques for Automatic and Interactive Discovery

  • M. Atzmueller

    Data mining on social interaction networks

    J. Data Min. Digital Humanities

    (2014)
  • M. Atzmueller

    Analyzing and grounding social interaction in online and offline networks

  • M. Atzmueller

    Subgroup discovery – advanced review

    WIREs: Data Min. Knowl. Discov.

    (2015)
  • M. Atzmueller et al.

    Face-to-face contacts at a conference: dynamics of communities and roles

  • M. Atzmueller et al.

    Fast subgroup discovery for continuous target concepts

  • M. Atzmueller, F. Lemmerich, B. Krause, A. Hotho, Who are the Spammers? Understandable local patterns for concept...
  • M. Atzmueller, F. Mitzlaff, Towards mining descriptive community patterns, in: Workshop on Mining Patterns and...
  • M. Atzmueller et al.

    Efficient descriptive community mining

  • M. Atzmueller et al.

    A case-based approach for characterization and analysis of subgroup patterns

    J. Appl. Intell.

    (2008)
  • M. Atzmueller, F. Puppe, H.-P. Buscher, Towards knowledge-intensive subgroup discovery, in: Proc. LWA 2004, Germany,...
  • M. Atzmueller, F. Puppe, H.-P. Buscher, Exploiting Background knowledge for knowledge-intensive subgroup discovery, in:...
  • R. Bayardo et al.

    Constraint-based rule mining in large, dense databases

    Data Min. Knowl. Discov.

    (2000)
  • D. Benz et al.

    The social bookmark and publication management system BibSonomy

    VLDB

    (2010)
  • I. Cantador et al.

    2nd workshop on information heterogeneity and fusion in recommender systems (HetRec)

  • S. Fortunato et al.

    Encyclopedia of Complexity and System Science

    (2007)
  • L. Freeman

    Segregation in social networks

    Sociol. Methods Res.

    (1978)
  • J. Fürnkranz et al.

    Guest editorial: global modeling using local patterns

    Data Min. Knowl. Discov.

    (2010)
  • E. Galbrun et al.

    Overlapping community detection in labeled graphs

    Data Min. Knowl. Discov.

    (2014)
  • U. Gargi et al.

    Large-scale community detection on YouTube for topic discovery and exploration

  • M. Girvan et al.

    Community structure in social and biological networks

    PNAS

    (2002)
  • S. Gregory, Finding overlapping communities in networks by label propagation, New J. Phys. (12)...
  • H. Grosskreutz et al.

    Tight optimistic estimates for fast subgroup discovery

  • S. Günnemann et al.

    GAMer: a synthesis of subspace clustering and dense subgraph mining

  • J. Han et al.

    Mining frequent patterns without candidate generation

  • W. Klösgen

    Explora: a multipattern and multistrategy discovery assistant

  • A.J. Knobbe et al.

    Pattern teams

  • M. Koyuturk et al.

    Assessing significance of connectivity and conservation in protein interaction networks

    J. Comput. Biol.

    (2007)
  • J.M. Kumpula et al.

    Sequential algorithm for fast clique percolation

    Phys. Rev. E

    (2008)
  • A. Lancichinetti, S. Fortunato, J. Kertész, Detecting the overlapping and hierarchical community structure in complex...
  • Cited by (0)

    View full text