Abstract
Many datasets can be encoded as graphs with sets of labels associated with the vertices. We consider this kind of graphs and we propose to look for patterns called maximal homogeneous clique sets, where such a pattern is a subgraph that is structured in several large cliques and where all vertices share enough labels. We present an algorithm based on graph enumeration to compute all patterns satisfying user-defined constraints on the number of separated cliques, on the size of these cliques, and on the number of labels shared by all the vertices. Our approach is tested on real datasets based on a social network of scientific collaborations and on a biological network of protein–protein interactions. The experiments show that the patterns are useful to exhibit subgraphs organized in several core modules of interactions. Performances are reported on real data and also on synthetic ones, showing that the approach can be applied on different kinds of large datasets.
Similar content being viewed by others
Notes
However, this does not require that cliques in \(M\) are maximal cliques in the whole graph \(\mathcal{G }\).
For sake of simplicity in the examples, a clique is simply denoted by a sequence of letters representing the clique vertices.
However, if needed, a simple post-processing can be used to remove these small cliques.
http://string-db.org/, snapshot of November 2009.
This confidence is a measure provided by STRING. A high confidence means that there are strong evidences that the interaction exists.
We retain the term list used in L2L, however these lists are simply sets.
The category of the lists containing genes categorized under biological process in GeneOntology [1].
Sensory perception of light stimulus is defined as the series of events required for an organism to receive a sensory light stimulus, convert it to a molecular signal, and recognize and characterize the signal.
References
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Becquet C, Blachon S, Jeudy B, Boulicaut JF, Gandrillon O (2002) Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genome Biol 3(12):1–16
Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 115–130
Boden B (2012) Efficient combined clustering of graph and attribute data. In: PhD Workshop of the 38th international conference on very large data bases (VLDB), pp 13–18
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Efficient breadth-first mining of frequent pattern with monotone constraints. Knowl Inf Syst (KAIS) 8(2):131–153
Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 858–863
Bringmann B, Zimmermann A (2009) One in a million: picking the right patterns. Knowl Inf Syst (KAIS) 18(1):61–81
Calders T, Ramon J, Dyck DV (2008) Anti-monotonic overlap-graph support measures. In: International conference on data mining (ICDM), pp 73–82
Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv 38(1) 1–69
Erdös P, Rényi A (1959) On random graphs. Publicationes Mathematicae 6:290–297
Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In: Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 147–159
Gallo A, Miettinen P, Mannila H (2008) Finding subgroups having several descriptions: algorithms for redescription mining. In: SIAM international conference on data mining (SDM), pp 334–345
Ge R, Ester M, Gao BJ, Hu Z, Bhattacharya B, Ben-Moshe B (2008) Joint cluster analysis of attribute data and relationship data: the connected k-center problem, algorithms and applications. ACM Trans Knowl Discov Data (TKDD) 2(2):1–35
Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:145–154
Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C (2009) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucl Acids Res 37:412–416
Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data (TKDD) 2(4):1–42
Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inf Syst (KAIS) 27(2):303–325
Khan A, Yan X, Wu KL (2010) Towards proximity pattern mining in large graphs. In: ACM SIGMOD international conference on management of data, pp 867–878
Knobbe AJ, Ho EKY (2006) Pattern teams. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 577–584
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Mining Knowl Discov (DMKD) 11(3):243–271
Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inf Syst (KAIS) 24(3):467–497
Leyritz J, Schicklin S, Blachon S, Keime C, Robardet C, Boulicaut JF, Besson J, Pensa RG, Gandrillon O (2008) Squat: a web tool to mine human, murine and avian sage data. BMC Bioinform 9(1):378
Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 33–49
Miyoshi Y, Ozaki T, Ohkawa T (2009) Frequent pattern discovery from a single graph with quantitative itemsets. In: International workshop on mining multiple information sources (ICDM workshop), pp 527–532
Moon J, Moser L (1965) On cliques in graphs. Israel J Math 3:23–28
Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SIAM international conference on data mining (SDM), pp 593–604
Mougel PN, Plantevit M, Rigotti C, Gandrillon O, Boulicaut JF (2010) Constraint-based mining of sets of cliques sharing vertex properties. In: International workshop on analysis of complex networks (ECML/PKDD workshop), pp 1–14
Newman JC, Weiner AM (2005) L2l: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol 6(9):81
Nguyen KN, Cerf L, Plantevit M, Boulicaut JF (2011) Multidimensional association rules in boolean tensors. In: SIAM international conference on data mining (SDM), pp 570–581
Raedt LD, Zimmermann A (2007) Constraint-based pattern set mining. In: SIAM international conference on data mining (SDM), pp 1–12
Silva A, Meira W, Zaki MJ (2010) Structural correlation pattern mining for large graphs. In: International workshop on mining and learning with graphs (MLG), pp 119–126
Silva A, Meira W, Zaki MJ (2012) Mining attribute-structure correlated patterns in large attributed graphs. Proc VLDB Endow (PVLDB) 5(5):466–477
Tomita E, Tanaka A, Takahashi H (2006) The worst-case time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci (TCS) 363:28–42
Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1:8
Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the (2012) ACM SIGMOD international conference on management of data, New York, NY, USA, pp 505–516
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: International conference on data mining (ICDM), pp 721–724
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow (PVLDB) 2(1):718–729
Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: International conference on data mining (ICDM), pp 689–698
Acknowledgments
We would like to thank the anonymous referees for their constructive comments and useful suggestions. This work is partly funded by the Rhône-Alpes Complex Systems Institute (IXXI) through the project REHMI and by the French National Research Agency (ANR) through the projects FOSTER (ANR-2010-COSI-012) and BINGO2 (ANR-07-MDCO-014).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mougel, PN., Rigotti, C., Plantevit, M. et al. Finding maximal homogeneous clique sets. Knowl Inf Syst 39, 579–608 (2014). https://doi.org/10.1007/s10115-013-0625-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0625-y