Skip to main content
Log in

MiMAG: mining coherent subgraphs in multi-layer graphs with edge labels

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Detecting dense subgraphs such as cliques or quasi-cliques is an important graph mining problem. While this task is established for simple graphs, today’s applications demand the analysis of more complex graphs: In this work, we consider a frequently observed type of graph where edges represent different types of relations. These multiple edge types can also be viewed as different “layers” of a graph, which is denoted as a “multi-layer graph”. Additionally, each edge might be annotated by a label characterizing the given relation in more detail. By simultaneously exploiting all this information, the detection of more interesting subgraphs can be supported. We introduce the multi-layer coherent subgraph model, which defines clusters of vertices that are densely connected by edges with similar labels in a subset of the graph layers. We avoid redundancy in the result by selecting only the most interesting, non-redundant subgraphs for the output. Based on this model, we introduce the best-first search algorithm MiMAG. In thorough experiments, we demonstrate the strengths of MiMAG in comparison with related approaches on synthetic as well as real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. In the following, we use the terms “layers” and “dimensions” interchangeably.

  2. The contents of this paper are also included in the first author’s Ph.D. thesis [5]

  3. To avoid confusion, we use the term “vertex” for a vertex in the original graph and the term “node” for the nodes of the set enumeration tree, which represent sets of vertices.

  4. Note that for each layer we get a different set enumeration tree (cf. Fig. 4, top) as different subtrees might be pruned in each layer.

  5. If one is interested in generating all patterns, an arbitrary traversal strategy can be used. We, however, want to determine only a subset of the patterns (the non-redundant, high-quality ones).

  6. Due to our redundancy definition, it is possible that clusters of equal quality are redundant w.r.t. each other. As in this case, there is no indication that one of the clusters is “better” than the other(s), we simply keep the cluster in the result that was detected first.

  7. Please note that the pruning strategies in line 15 and 19 do not discard any valid clusters, but only vertices and dimensions that do not contribute to clusters (see Sect. 4.3).

  8. Assume there would exist an admissible algorithm expanding fewer subtrees. Then, after termination of this algorithm, there exists at least one subtree that has not been investigated by this algorithm but whose estimated quality is higher than the quality of the clusters in the result. Based on the information the algorithm has, however, it cannot rule out the possibility that this subtree contains a valid cluster of this quality. Thus, the algorithm cannot be admissible.

  9. Note: This property does not hold for the cluster model itself (neither for the set of vertices nor for the relevant dimensions). It is possible that O does not form a quasi-clique in dimension i, but \(O^{\prime } \supset O\) does.

  10. http://imdb.com.

  11. http://www.cs.cornell.edu/projects/kddcup/datasets.html.

  12. http://dblp.uni-trier.de.

References

  1. Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, New York

    Book  MATH  Google Scholar 

  2. Araujo M, Günnemann S, Papadimitriou S, Faloutsos C, Basu P, Swami A, Papalexakis EE, Koutra D (2016) Discovery of “comet” communities in temporal and labeled graphs com\(^{\wedge 2}\). Knowl Inf Syst 46(3):657–677. doi:10.1007/s10115-015-0847-2

  3. Berlingerio M, Coscia M, Giannotti F (2011) Finding and characterizing communities in multidimensional networks. In: ASONAM, pp 490–494. doi:10.1109/ASONAM.2011.104

  4. Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: ICDT, pp 217–235

  5. Boden B (2014) Combined clustering of graph and attribute data. PhD thesis, RWTH Aachen University

  6. Boden B, Günnemann S, Hoffmann H, Seidl T (2012) Mining coherent subgraphs in multi-layer graphs with edge labels. In: SIGKDD

  7. Boden B, Günnemann S, Hoffmann H, Seidl T (2013) RMiCS: a robust approach for mining coherent subgraphs in edge-labeled multi-layer graphs. In: SSDBM, p 23

  8. Cai D, Shao Z, He X, Yan X, Han J (2005) Community mining from multi-relational networks. PKDD 3721:445–452

    Google Scholar 

  9. Cerf L, Besson J, Robardet C, Boulicaut JF (2008) Data-peeler: constraint-based closed pattern mining in n-ary relations. SDM 8:37–48

    Google Scholar 

  10. Cerf L, Besson J, Robardet C, Boulicaut JF (2009a) Closed patterns meet n-ary relations. TKDD 3(1):1–3

    Google Scholar 

  11. Cerf L, Nguyen TBN, Boulicaut JF (2009b) Discovering relevant cross-graph cliques in dynamic networks. In: ISMIS, pp 513–522

  12. Cheng Y, Zhao R (2009) Multiview spectral clustering via ensemble. In: GRC, IEEE, pp 101–106

  13. Dong X, Frossard P, Vandergheynst P, Nefedov N (2012) Clustering with multi-layer graphs: a spectral perspective. Signal Process 60(11):5820–5831. doi:10.1109/TSP.2012.2212886

    MathSciNet  Google Scholar 

  14. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  15. Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: ICDM, pp 845–850

  16. Günnemann S, Boden B, Seidl T (2011) DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors. In: PKDD, pp 565–580

  17. Günnemann S, Färber I, Müller E, Assent I, Seidl T (2011) External evaluation measures for subspace clustering. In: CIKM

  18. Günnemann S, Boden B, Seidl T (2012) Finding density-based subspace clusters in graphs with feature vectors. Data Min Knowl Discov 25(2):243–269

    Article  MathSciNet  MATH  Google Scholar 

  19. Günnemann S, Färber I, Raubach S, Seidl T (2013) Spectral subspace clustering for graphs with feature vectors. In: ICDM, pp 231–240

  20. Günnemann S, Färber I, Boden B, Seidl T (2014) Gamer: a synthesis of subspace clustering and dense subgraph mining. Knowl Inf Syst 40(2):243–278

    Article  Google Scholar 

  21. Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:145–154

    Article  Google Scholar 

  22. Harary F, Norman R (1960) Some properties of line digraphs. Rendiconti del Circolo Matematico di Palermo 9(2):161–168

    Article  MathSciNet  MATH  Google Scholar 

  23. Hart P, Nilsson N, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. Syst Sci Cybern 4(2):100–107. doi:10.1109/TSSC.1968.300136

    Article  Google Scholar 

  24. Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1):1–58. doi:10.1145/1497577.1497578

    Article  Google Scholar 

  25. Li M, Fan Y, Chen J, Gao L, Di Z, Wu J (2005) Weighted networks of scientific communication: the measurement and topological role of weight. Physica A: Stat Mech Appl 350(2):643–656

    Article  Google Scholar 

  26. Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: ECML/PKDD (2), pp 33–49

  27. Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604

  28. Müller E, Assent I, Günnemann S, Krieger R, Seidl T (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp 377–386

  29. Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281

  30. Neville J, Adler M, Jensen D (2004) Spectral clustering with links and attributes. University of Massachusetts Amherst, Technical Report, Department of Computer Science

    Google Scholar 

  31. Pearl J (1984) Heuristics: intelligent search strategies for computer problem solving. Addison-Wesley Pub. Co., Inc, Reading

    Google Scholar 

  32. Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: SIGKDD, pp 228–238

  33. Qi G, Aggarwal C, Huang T (2012) Community detection with edge content in social media networks. In: ICDE, pp 534–545

  34. Rymon R (1992) Search through systematic set enumeration. In: KR, pp 539–550

  35. Shiga M, Takigawa I, Mamitsuka H (2007) A spectral clustering approach to optimally combining numerical vectors with a modular network. In: SIGKDD, pp 647–656

  36. Spielmat D, Teng S (1996) Spectral partitioning works: planar graphs and finite element meshes. In: FOCS, pp 96–105

  37. Spyropoulou E, De Bie T (2011) Interesting multi-relational patterns. In: ICDM, pp 675–684

  38. Tang L, Wang X, Liu H (2009a) Uncovering groups via heterogeneous interaction analysis. In: ICDM, pp 503–512

  39. Tang W, Lu Z, Dhillon IS (2009b) Clustering with multiple graphs. In: Ninth IEEE international conference on data mining, ICDM’09, pp 1016–1021

  40. Tang L, Wang X, Liu H (2012) Community detection via heterogeneous interaction analysis. DMKD 25(1):1–33

    MathSciNet  Google Scholar 

  41. Wang J, Zeng Z, Zhou L (2006) Clan: an algorithm for mining closed cliques from large dense graph databases. In: ICDE, p 73. doi:10.1109/ICDE.2006.34

  42. Wu Z, Yin W, Cao J, Xu G, Cuzzocrea A (2013) Community detection in multi-relational social networks. In: Web Information Systems Engineering-WISE 2013. Springer, pp 43–56

  43. Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: SIGKDD, pp 797–802

  44. Zhou W, Jin H, Liu Y (2012) Community discovery and profiling with social messages. In: SIGKDD, pp 388–396

  45. Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. PVLDB 2(1):718–729

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brigitte Boden.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boden, B., Günnemann, S., Hoffmann, H. et al. MiMAG: mining coherent subgraphs in multi-layer graphs with edge labels. Knowl Inf Syst 50, 417–446 (2017). https://doi.org/10.1007/s10115-016-0949-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0949-5

Keywords

Navigation