Finding density-based subspace clusters in graphs with feature vectors

Günnemann, Stephan; Boden, Brigitte; Seidl, Thomas

doi:10.1007/s10618-012-0272-z

Finding density-based subspace clusters in graphs with feature vectors

Published: 03 June 2012

Volume 25, pages 243–269, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Stephan Günnemann¹,
Brigitte Boden¹ &
Thomas Seidl¹

804 Accesses
17 Citations
Explore all metrics

Abstract

Data sources representing attribute information in combination with network information are widely available in today’s applications. To realize the full potential for knowledge extraction, mining techniques like clustering should consider both information types simultaneously. Recent clustering approaches combine subspace clustering with dense subgraph mining to identify groups of objects that are similar in subsets of their attributes as well as densely connected within the network. While those approaches successfully circumvent the problem of full-space clustering, their limited cluster definitions are restricted to clusters of certain shapes. In this work we introduce a density-based cluster definition, which takes into account the attribute similarity in subspaces as well as a local graph density and enables us to detect clusters of arbitrary shape and size. Furthermore, we avoid redundancy in the result by selecting only the most interesting non-redundant clusters. Based on this model, we introduce the clustering algorithm DB-CSC, which uses a fixed point iteration method to efficiently determine the clustering solution. We prove the correctness and complexity of this fixed point iteration analytically. In thorough experiments we demonstrate the strength of DB-CSC in comparison to related approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, New York
Book MATH Google Scholar
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp 94–105. SIGMOD, Seattle
Assent I, Krieger R, Müller E, Seidl T (2008) EDSC: efficient density-based subspace clustering. In: CIKM, pp 1093–1102. CIKM, Glasgow
Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful? In: ICDT, pp 217–235. ICDT, Mont Blanc
Dorogovtsev S, Goltsev A, Mendes J (2006) K-core organization of complex networks. Phys Rev Lett 96(4): 40–601
Article Google Scholar
Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: WebKDD/SNA-KDD, pp 16–25. SNA-KDD, San Jose
Ester M, Kriegel HP, S J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231. KDD, Portland
Ester M, Ge R, Gao BJ, Hu Z, Ben-Moshe B (2006) Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: SDM. SDM, Bethesda
Günnemann S, Müller E, Färber I, Seidl T (2009) Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp 1317–1326. CIKM, Hong Kong
Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: ICDM, pp 845–850. ICDM, Sydney
Günnemann S, Kremer H, Seidl T (2010) Subspace clustering for uncertain data. In: SDM, pp 385–396. SDM, Bethesda
Günnemann S, Boden B, Seidl T (2011) DB-CSC: A density-based approach for subspace clustering in graphs with feature vectors. In: ECML/PKDD (1), pp 565–580. ECML, Athens
Günnemann S, Färber I, Müller E, Assent I, Seidl T (2011) External evaluation measures for subspace clustering. In: CIKM, pp 1363–1372. CIKM, Glasgow
Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18: 145–154
Article Google Scholar
Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: KDD, pp 58–65. KDD, New York
Janson S, Luczak M (2007) A simple solution to the k-core problem. Rand Struct Algorithm 30(1–2): 50–62
Article MathSciNet MATH Google Scholar
Kailing K, Kriegel HP, Kroeger P (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257. SDM, Bethesda
Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. Trans Knowl Discov Data 3(1): 1–58
Article Google Scholar
Kubica J, Moore AW, Schneider JG (2003) Tractable group detection on large link data sets. In: ICDM, pp 573–576. ICDM, Sydney
Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: KDD, pp 317–326. KDD, Portland
Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: KDD, pp 470–479. KDD, Portland
Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541. KDD, Portland
Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604. SDM, Bethesda
Müller E, Assent I, Günnemann S, Krieger R, Seidl T (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp 377–386. ICDM, Sydney
Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281. VLDB, Singapore
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor 6(1): 90–105
Article Google Scholar
Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: KDD, pp 228–238. KDD, Portland
Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM, pp 643–648. ICDM, Sydney
Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1(1): 8
Article Google Scholar
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. PVLDB 2(1): 718–729
Google Scholar
Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: ICDM, pp 689–698. ICDM, Sydney

Download references

Author information

Authors and Affiliations

Data Management and Data Exploration Group, RWTH Aachen University, Aachen, Germany
Stephan Günnemann, Brigitte Boden & Thomas Seidl

Authors

Stephan Günnemann
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Boden
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Günnemann.

Additional information

Responsible editor: Dimitrios Gunopulos, Donato Malerba, Michalis Vazirgiannis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Günnemann, S., Boden, B. & Seidl, T. Finding density-based subspace clusters in graphs with feature vectors. Data Min Knowl Disc 25, 243–269 (2012). https://doi.org/10.1007/s10618-012-0272-z

Download citation

Received: 31 October 2011
Accepted: 16 May 2012
Published: 03 June 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10618-012-0272-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding density-based subspace clusters in graphs with feature vectors

Abstract

Access this article

Similar content being viewed by others

Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors

Density-Based Subspace Clustering in Heterogeneous Networks

StruClus: Scalable Structural Graph Set Clustering with Representative Sampling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding density-based subspace clusters in graphs with feature vectors

Abstract

Access this article

Similar content being viewed by others

Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors

Density-Based Subspace Clustering in Heterogeneous Networks

StruClus: Scalable Structural Graph Set Clustering with Representative Sampling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation