Description-oriented community detection using exhaustive subgroup discovery

doi:10.1016/j.ins.2015.05.008

Information Sciences

Volume 329, 1 February 2016, Pages 965-984

https://doi.org/10.1016/j.ins.2015.05.008 Get rights and content

Abstract

Communities can intuitively be defined as subsets of nodes of a graph with a dense structure in the corresponding subgraph. However, for mining such communities usually only structural aspects are taken into account. Typically, no concise nor easily interpretable community description is provided.

For tackling this issue, this paper focuses on description-oriented community detection using subgroup discovery. In order to provide both structurally valid and interpretable communities we utilize the graph structure as well as additional descriptive features of the graph’s nodes. A descriptive community pattern built upon these features then describes and identifies a community, i.e., a set of nodes, and vice versa. Essentially, we mine patterns in the “description space” characterizing interesting sets of nodes (i.e., subgroups) in the “graph space”; the interestingness of a community is evaluated by a selectable quality measure.

We aim at identifying communities according to standard community quality measures, while providing characteristic descriptions of these communities at the same time. For this task, we propose several optimistic estimates of standard community quality functions to be used for efficient pruning of the search space in an exhaustive branch-and-bound algorithm. We demonstrate our approach in an evaluation using five real-world data sets, obtained from three different social media applications.

Introduction

While classic community detection, e.g., [17] for a survey, just identifies subgroups of nodes with a dense structure, lacking an interpretable description, this paper focuses on the task of description-oriented community detection. Using additional descriptive features of the nodes contained in the network, we approach the task of identifying communities as sets of nodes together with a description, i.e., a logical formula on the values of the nodes’ descriptive features. Such a community pattern then provides an intuitive description of the community, e.g., by an easily interpretable conjunction of attribute–value pairs. This is usually not achieved by classical community mining methods that consider the nodes of a network (e.g., denoting users in a social network) as mere strings or ids.

We present an algorithm for description-oriented community detection of the top-k communities (described by community patterns) with respect to a number of standard community evaluation functions. The method is based on an adapted subgroup discovery approach [10], [36], and also tackles typical problems that are not addressed by standard approaches for community detection such as pathological cases like small community sizes. We focus on interpretable patterns that can easily be incorporated into a practical application, for example, for recommendations in social bookmarking systems. It is important to note that we focus on static social graphs and do not take the dynamics into account since we aim to characterize a given community (allocation) for a given fixed interaction structure. Also, since in practice the entities in a network tend to belong to a number of different communities, the presented method naturally captures overlapping community allocations. Moreover, in contrast to global approaches, we focus on the discovery of local communities. According to the idea of local pattern mining, e.g., [20], we do not try to find a complete (global) partitioning of the network. Instead, we consider a set of local, potentially overlapping communities. These should be as exceptional as possible with respect to a given community quality measure.

We demonstrate our approach on several social media applications such as social networking and social bookmarking systems that provide interaction networks like explicit friendship relations between users. However, the presented approach is not limited to such systems and can be applied to any kind of graph-structured data for which additional descriptive features (node labels) are available, e.g., certain activity in telephone networks or interactions in face-to-face contacts [6] that also utilize tags or topic descriptions for the contained relations.

As an accompanying example, throughout the paper we use the friendship graph of the social bookmarking system BibSonomy¹ [15]. In BibSonomy, users can declare their friendship toward other users, thus, creating a directed graph with users as nodes. At the same time, each user collects and tags resources like publications and web pages. Thus, a user’s set of tags can be considered as a description of that user’s interests. The community mining task here is to find user groups, where users are well connected by their friendship links and share a common interest in one or more features (tags).

Overall, the contribution of this paper can be summarized as follows:

1.
We first introduce description-oriented community detection and present the COMODO algorithm for obtaining the k-best community patterns using a given community evaluation measure. COMODO is a branch-and-bound algorithm based on an exhaustive subgroup discovery approach.
2.
For fast description-oriented community detection using COMODO, we propose optimistic estimates [25], [62] which are efficient to compute. We consider a number of standard community quality functions: The segregation index [19], the inverse average ODF (out degree fraction) [38], and the modularity [49]. We discuss the different measures for unweighted and weighted graphs, and extend the optimistic estimates accordingly.
3.
We evaluate the presented approach using five data sets from three real-world social applications, i.e., from the social bookmarking systems BibSonomy and delicious,² and from the social media platform last.fm.³

The remainder of the paper is structured as follows: Section 2 summarizes basics of subgroup discovery, and provides general notions of graphs and community mining measures. Next, Section 3 introduces the proposed approach for description-oriented community detection and presents a number of optimistic estimates for standard community evaluation functions. After that, Section 4 discusses related work. For demonstrating the effectiveness and validity of the presented approach, Section 5 provides experiments using five data sets and discusses their results in the context of the three real-world applications. Finally, Section 6 concludes the paper with a summary and directions for future research.

Section snippets

Preliminaries

In the following, we briefly introduce basic notions with respect to pattern mining using subgroup discovery, graphs, and community quality measures.

Description-oriented community detection

Many community mining algorithms collect sets of nodes denoting the individual communities focusing on structural aspects of the graph; typically there is no simple, and easily interpretable description. In our example, a user community would be represented merely as a set of names (strings) or ids. To bridge this gap, we combine community detection and subgroup discovery in a unified approach for mining community patterns. This tackles one of the basic problem of community detection in many

Related work

Community detection methods can be classified according to several dimensions. We distinguish between methods that detect disjoint communities, i.e., where actors in a network can only belong to exactly one community, and those that allow overlapping communities, where actors can belong to multiple communities at the same time. Furthermore, we distinguish between methods that work on extended (attributed) graphs, e.g., with descriptive information about the nodes, and methods that work on the

Experiments

In the following, we first describe the data sets, before we present the conducted experiments and discuss the results. We focus on evaluating the efficiency of the presented pruning approach considering the search steps of the COMODO algorithm. Furthermore, we discuss properties of the discovered communities in order to assess their validity.

Conclusions

In this paper, we have presented an approach for description-oriented community detection using exhaustive subgroup discovery. We presented the COMODO algorithm for the discovery of community patterns. Furthermore, we proposed suitable optimistic estimates for a range of standard community quality functions; the optimistic estimates are efficient to compute and enable an effective approach. Our proposed method ensures that the top-k communities (representable by a given set of describing

Acknowledgements

This work has been partially supported by the VENUS research cluster at the Interdisciplinary Research Center for Information System Design (ITeG) at Kassel University, and by the Commune project funded by the Hertie Foundation.

References (66)

S. Fortunato
Community detection in graphs
Phys. Rep.
(2010)
S.E. Schaeffer
Graph clustering
Comput. Sci. Rev.
(2007)
M. Adnan et al.
Identifying social communities by frequent pattern mining
M. Atzmueller
Knowledge-Intensive Subgroup Mining – Techniques for Automatic and Interactive Discovery
M. Atzmueller
Data mining on social interaction networks
J. Data Min. Digital Humanities
(2014)
M. Atzmueller
Analyzing and grounding social interaction in online and offline networks
M. Atzmueller
Subgroup discovery – advanced review
WIREs: Data Min. Knowl. Discov.
(2015)
M. Atzmueller et al.
Face-to-face contacts at a conference: dynamics of communities and roles
M. Atzmueller et al.
Fast subgroup discovery for continuous target concepts
M. Atzmueller, F. Lemmerich, B. Krause, A. Hotho, Who are the Spammers? Understandable local patterns for concept...

M. Atzmueller, F. Mitzlaff, Towards mining descriptive community patterns, in: Workshop on Mining Patterns and...

M. Atzmueller et al.

Efficient descriptive community mining

M. Atzmueller et al.

A case-based approach for characterization and analysis of subgroup patterns

J. Appl. Intell.

(2008)

M. Atzmueller, F. Puppe, H.-P. Buscher, Towards knowledge-intensive subgroup discovery, in: Proc. LWA 2004, Germany,...

M. Atzmueller, F. Puppe, H.-P. Buscher, Exploiting Background knowledge for knowledge-intensive subgroup discovery, in:...

R. Bayardo et al.

Constraint-based rule mining in large, dense databases

Data Min. Knowl. Discov.

(2000)

D. Benz et al.

The social bookmark and publication management system BibSonomy

VLDB

(2010)

I. Cantador et al.

2nd workshop on information heterogeneity and fusion in recommender systems (HetRec)

S. Fortunato et al.

Encyclopedia of Complexity and System Science

(2007)

L. Freeman

Segregation in social networks

Sociol. Methods Res.

(1978)

J. Fürnkranz et al.

Guest editorial: global modeling using local patterns

Data Min. Knowl. Discov.

(2010)

E. Galbrun et al.

Overlapping community detection in labeled graphs

Data Min. Knowl. Discov.

(2014)

U. Gargi et al.

Large-scale community detection on YouTube for topic discovery and exploration

M. Girvan et al.

Community structure in social and biological networks

PNAS

(2002)

S. Gregory, Finding overlapping communities in networks by label propagation, New J. Phys. (12)...

H. Grosskreutz et al.

Tight optimistic estimates for fast subgroup discovery

S. Günnemann et al.

GAMer: a synthesis of subspace clustering and dense subgraph mining

J. Han et al.

Mining frequent patterns without candidate generation

W. Klösgen

Explora: a multipattern and multistrategy discovery assistant

A.J. Knobbe et al.

Pattern teams

M. Koyuturk et al.

Assessing significance of connectivity and conservation in protein interaction networks

J. Comput. Biol.

(2007)

J.M. Kumpula et al.

Sequential algorithm for fast clique percolation

Phys. Rev. E

(2008)

A. Lancichinetti, S. Fortunato, J. Kertész, Detecting the overlapping and hierarchical community structure in complex...

Cited by (0)

View full text

Description-oriented community detection using exhaustive subgroup discovery

Abstract

Introduction

Section snippets

Preliminaries

Description-oriented community detection

Related work

Experiments

Conclusions

Acknowledgements

Phys. Rep.

Comput. Sci. Rev.

Identifying social communities by frequent pattern mining

Knowledge-Intensive Subgroup Mining – Techniques for Automatic and Interactive Discovery

Data mining on social interaction networks

J. Data Min. Digital Humanities

Analyzing and grounding social interaction in online and offline networks

Subgroup discovery – advanced review

WIREs: Data Min. Knowl. Discov.

Face-to-face contacts at a conference: dynamics of communities and roles

Fast subgroup discovery for continuous target concepts

Efficient descriptive community mining

A case-based approach for characterization and analysis of subgroup patterns

J. Appl. Intell.

Constraint-based rule mining in large, dense databases

Data Min. Knowl. Discov.

The social bookmark and publication management system BibSonomy

VLDB

2nd workshop on information heterogeneity and fusion in recommender systems (HetRec)

Encyclopedia of Complexity and System Science

Segregation in social networks

Sociol. Methods Res.

Guest editorial: global modeling using local patterns

Data Min. Knowl. Discov.

Overlapping community detection in labeled graphs

Data Min. Knowl. Discov.

Large-scale community detection on YouTube for topic discovery and exploration

Community structure in social and biological networks

PNAS

Tight optimistic estimates for fast subgroup discovery

GAMer: a synthesis of subspace clustering and dense subgraph mining

Mining frequent patterns without candidate generation

Explora: a multipattern and multistrategy discovery assistant

Pattern teams

Assessing significance of connectivity and conservation in protein interaction networks

J. Comput. Biol.

Sequential algorithm for fast clique percolation

Phys. Rev. E