Skip to main content
Log in

Finding maximal homogeneous clique sets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Many datasets can be encoded as graphs with sets of labels associated with the vertices. We consider this kind of graphs and we propose to look for patterns called maximal homogeneous clique sets, where such a pattern is a subgraph that is structured in several large cliques and where all vertices share enough labels. We present an algorithm based on graph enumeration to compute all patterns satisfying user-defined constraints on the number of separated cliques, on the size of these cliques, and on the number of labels shared by all the vertices. Our approach is tested on real datasets based on a social network of scientific collaborations and on a biological network of protein–protein interactions. The experiments show that the patterns are useful to exhibit subgraphs organized in several core modules of interactions. Performances are reported on real data and also on synthetic ones, showing that the approach can be applied on different kinds of large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. However, this does not require that cliques in \(M\) are maximal cliques in the whole graph \(\mathcal{G }\).

  2. For sake of simplicity in the examples, a clique is simply denoted by a sequence of letters representing the clique vertices.

  3. However, if needed, a simple post-processing can be used to remove these small cliques.

  4. http://www.informatik.uni-trier.de/~ley/db/.

  5. http://string-db.org/, snapshot of November 2009.

  6. http://bsmc.insa-lyon.fr/squat/.

  7. This confidence is a measure provided by STRING. A high confidence means that there are strong evidences that the interaction exists.

  8. http://www.genenames.org/.

  9. We retain the term list used in L2L, however these lists are simply sets.

  10. The category of the lists containing genes categorized under biological process in GeneOntology [1].

  11. Sensory perception of light stimulus is defined as the series of events required for an organism to receive a sensory light stimulus, convert it to a molecular signal, and recognize and characterize the signal.

References

  1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29

    Article  Google Scholar 

  2. Becquet C, Blachon S, Jeudy B, Boulicaut JF, Gandrillon O (2002) Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genome Biol 3(12):1–16

    Google Scholar 

  3. Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 115–130

  4. Boden B (2012) Efficient combined clustering of graph and attribute data. In: PhD Workshop of the 38th international conference on very large data bases (VLDB), pp 13–18

  5. Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Efficient breadth-first mining of frequent pattern with monotone constraints. Knowl Inf Syst (KAIS) 8(2):131–153

    Article  Google Scholar 

  6. Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 858–863

  7. Bringmann B, Zimmermann A (2009) One in a million: picking the right patterns. Knowl Inf Syst (KAIS) 18(1):61–81

    Article  Google Scholar 

  8. Calders T, Ramon J, Dyck DV (2008) Anti-monotonic overlap-graph support measures. In: International conference on data mining (ICDM), pp 73–82

  9. Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv 38(1) 1–69

    Google Scholar 

  10. Erdös P, Rényi A (1959) On random graphs. Publicationes Mathematicae 6:290–297

    MATH  Google Scholar 

  11. Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In: Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 147–159

  12. Gallo A, Miettinen P, Mannila H (2008) Finding subgroups having several descriptions: algorithms for redescription mining. In: SIAM international conference on data mining (SDM), pp 334–345

  13. Ge R, Ester M, Gao BJ, Hu Z, Bhattacharya B, Ben-Moshe B (2008) Joint cluster analysis of attribute data and relationship data: the connected k-center problem, algorithms and applications. ACM Trans Knowl Discov Data (TKDD) 2(2):1–35

    Article  Google Scholar 

  14. Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:145–154

    Article  Google Scholar 

  15. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C (2009) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucl Acids Res 37:412–416

    Article  Google Scholar 

  16. Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data (TKDD) 2(4):1–42

    Article  MathSciNet  Google Scholar 

  17. Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inf Syst (KAIS) 27(2):303–325

    Article  Google Scholar 

  18. Khan A, Yan X, Wu KL (2010) Towards proximity pattern mining in large graphs. In: ACM SIGMOD international conference on management of data, pp 867–878

  19. Knobbe AJ, Ho EKY (2006) Pattern teams. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 577–584

  20. Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Mining Knowl Discov (DMKD) 11(3):243–271

    Article  MathSciNet  Google Scholar 

  21. Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inf Syst (KAIS) 24(3):467–497

    Article  Google Scholar 

  22. Leyritz J, Schicklin S, Blachon S, Keime C, Robardet C, Boulicaut JF, Besson J, Pensa RG, Gandrillon O (2008) Squat: a web tool to mine human, murine and avian sage data. BMC Bioinform 9(1):378

    Article  Google Scholar 

  23. Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 33–49

  24. Miyoshi Y, Ozaki T, Ohkawa T (2009) Frequent pattern discovery from a single graph with quantitative itemsets. In: International workshop on mining multiple information sources (ICDM workshop), pp 527–532

  25. Moon J, Moser L (1965) On cliques in graphs. Israel J Math 3:23–28

    Article  MATH  MathSciNet  Google Scholar 

  26. Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SIAM international conference on data mining (SDM), pp 593–604

  27. Mougel PN, Plantevit M, Rigotti C, Gandrillon O, Boulicaut JF (2010) Constraint-based mining of sets of cliques sharing vertex properties. In: International workshop on analysis of complex networks (ECML/PKDD workshop), pp 1–14

  28. Newman JC, Weiner AM (2005) L2l: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol 6(9):81

    Article  Google Scholar 

  29. Nguyen KN, Cerf L, Plantevit M, Boulicaut JF (2011) Multidimensional association rules in boolean tensors. In: SIAM international conference on data mining (SDM), pp 570–581

  30. Raedt LD, Zimmermann A (2007) Constraint-based pattern set mining. In: SIAM international conference on data mining (SDM), pp 1–12

  31. Silva A, Meira W, Zaki MJ (2010) Structural correlation pattern mining for large graphs. In: International workshop on mining and learning with graphs (MLG), pp 119–126

  32. Silva A, Meira W, Zaki MJ (2012) Mining attribute-structure correlated patterns in large attributed graphs. Proc VLDB Endow (PVLDB) 5(5):466–477

    Google Scholar 

  33. Tomita E, Tanaka A, Takahashi H (2006) The worst-case time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci (TCS) 363:28–42

    Article  MATH  MathSciNet  Google Scholar 

  34. Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1:8

    Article  Google Scholar 

  35. Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the (2012) ACM SIGMOD international conference on management of data, New York, NY, USA, pp 505–516

  36. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: International conference on data mining (ICDM), pp 721–724

  37. Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow (PVLDB) 2(1):718–729

    Google Scholar 

  38. Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: International conference on data mining (ICDM), pp 689–698

Download references

Acknowledgments

We would like to thank the anonymous referees for their constructive comments and useful suggestions. This work is partly funded by the Rhône-Alpes Complex Systems Institute (IXXI) through the project REHMI and by the French National Research Agency (ANR) through the projects FOSTER (ANR-2010-COSI-012) and BINGO2 (ANR-07-MDCO-014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christophe Rigotti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mougel, PN., Rigotti, C., Plantevit, M. et al. Finding maximal homogeneous clique sets. Knowl Inf Syst 39, 579–608 (2014). https://doi.org/10.1007/s10115-013-0625-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0625-y

Keywords

Navigation