Finding maximal homogeneous clique sets

Mougel, Pierre-Nicolas; Rigotti, Christophe; Plantevit, Marc; Gandrillon, Olivier

doi:10.1007/s10115-013-0625-y

Finding maximal homogeneous clique sets

Regular Paper
Published: 27 March 2013

Volume 39, pages 579–608, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Pierre-Nicolas Mougel¹,
Christophe Rigotti¹,
Marc Plantevit² &
…
Olivier Gandrillon³

354 Accesses
4 Citations
Explore all metrics

Abstract

Many datasets can be encoded as graphs with sets of labels associated with the vertices. We consider this kind of graphs and we propose to look for patterns called maximal homogeneous clique sets, where such a pattern is a subgraph that is structured in several large cliques and where all vertices share enough labels. We present an algorithm based on graph enumeration to compute all patterns satisfying user-defined constraints on the number of separated cliques, on the size of these cliques, and on the number of labels shared by all the vertices. Our approach is tested on real datasets based on a social network of scientific collaborations and on a biological network of protein–protein interactions. The experiments show that the patterns are useful to exhibit subgraphs organized in several core modules of interactions. Performances are reported on real data and also on synthetic ones, showing that the approach can be applied on different kinds of large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Article 15 February 2022

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Article Open access 13 August 2018

Recent Advances in Graph Partitioning

Notes

However, this does not require that cliques in \(M\) are maximal cliques in the whole graph \(\mathcal{G }\).
For sake of simplicity in the examples, a clique is simply denoted by a sequence of letters representing the clique vertices.
However, if needed, a simple post-processing can be used to remove these small cliques.
http://www.informatik.uni-trier.de/~ley/db/.
http://string-db.org/, snapshot of November 2009.
http://bsmc.insa-lyon.fr/squat/.
This confidence is a measure provided by STRING. A high confidence means that there are strong evidences that the interaction exists.
http://www.genenames.org/.
We retain the term list used in L2L, however these lists are simply sets.
The category of the lists containing genes categorized under biological process in GeneOntology [1].
Sensory perception of light stimulus is defined as the series of events required for an organism to receive a sensory light stimulus, convert it to a molecular signal, and recognize and characterize the signal.

References

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Article Google Scholar
Becquet C, Blachon S, Jeudy B, Boulicaut JF, Gandrillon O (2002) Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data. Genome Biol 3(12):1–16
Google Scholar
Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 115–130
Boden B (2012) Efficient combined clustering of graph and attribute data. In: PhD Workshop of the 38th international conference on very large data bases (VLDB), pp 13–18
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Efficient breadth-first mining of frequent pattern with monotone constraints. Knowl Inf Syst (KAIS) 8(2):131–153
Article Google Scholar
Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 858–863
Bringmann B, Zimmermann A (2009) One in a million: picking the right patterns. Knowl Inf Syst (KAIS) 18(1):61–81
Article Google Scholar
Calders T, Ramon J, Dyck DV (2008) Anti-monotonic overlap-graph support measures. In: International conference on data mining (ICDM), pp 73–82
Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv 38(1) 1–69
Google Scholar
Erdös P, Rényi A (1959) On random graphs. Publicationes Mathematicae 6:290–297
MATH Google Scholar
Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In: Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 147–159
Gallo A, Miettinen P, Mannila H (2008) Finding subgroups having several descriptions: algorithms for redescription mining. In: SIAM international conference on data mining (SDM), pp 334–345
Ge R, Ester M, Gao BJ, Hu Z, Bhattacharya B, Ben-Moshe B (2008) Joint cluster analysis of attribute data and relationship data: the connected k-center problem, algorithms and applications. ACM Trans Knowl Discov Data (TKDD) 2(2):1–35
Article Google Scholar
Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18:145–154
Article Google Scholar
Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C (2009) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucl Acids Res 37:412–416
Article Google Scholar
Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data (TKDD) 2(4):1–42
Article MathSciNet Google Scholar
Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inf Syst (KAIS) 27(2):303–325
Article Google Scholar
Khan A, Yan X, Wu KL (2010) Towards proximity pattern mining in large graphs. In: ACM SIGMOD international conference on management of data, pp 867–878
Knobbe AJ, Ho EKY (2006) Pattern teams. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 577–584
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Mining Knowl Discov (DMKD) 11(3):243–271
Article MathSciNet Google Scholar
Lahiri M, Berger-Wolf TY (2010) Periodic subgraph mining in dynamic networks. Knowl Inf Syst (KAIS) 24(3):467–497
Article Google Scholar
Leyritz J, Schicklin S, Blachon S, Keime C, Robardet C, Boulicaut JF, Besson J, Pensa RG, Gandrillon O (2008) Squat: a web tool to mine human, murine and avian sage data. BMC Bioinform 9(1):378
Article Google Scholar
Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: European conference on machine learning and knowledge discovery in databases (ECML/PKDD), pp 33–49
Miyoshi Y, Ozaki T, Ohkawa T (2009) Frequent pattern discovery from a single graph with quantitative itemsets. In: International workshop on mining multiple information sources (ICDM workshop), pp 527–532
Moon J, Moser L (1965) On cliques in graphs. Israel J Math 3:23–28
Article MATH MathSciNet Google Scholar
Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SIAM international conference on data mining (SDM), pp 593–604
Mougel PN, Plantevit M, Rigotti C, Gandrillon O, Boulicaut JF (2010) Constraint-based mining of sets of cliques sharing vertex properties. In: International workshop on analysis of complex networks (ECML/PKDD workshop), pp 1–14
Newman JC, Weiner AM (2005) L2l: a simple tool for discovering the hidden significance in microarray expression data. Genome Biol 6(9):81
Article Google Scholar
Nguyen KN, Cerf L, Plantevit M, Boulicaut JF (2011) Multidimensional association rules in boolean tensors. In: SIAM international conference on data mining (SDM), pp 570–581
Raedt LD, Zimmermann A (2007) Constraint-based pattern set mining. In: SIAM international conference on data mining (SDM), pp 1–12
Silva A, Meira W, Zaki MJ (2010) Structural correlation pattern mining for large graphs. In: International workshop on mining and learning with graphs (MLG), pp 119–126
Silva A, Meira W, Zaki MJ (2012) Mining attribute-structure correlated patterns in large attributed graphs. Proc VLDB Endow (PVLDB) 5(5):466–477
Google Scholar
Tomita E, Tanaka A, Takahashi H (2006) The worst-case time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci (TCS) 363:28–42
Article MATH MathSciNet Google Scholar
Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1:8
Article Google Scholar
Xu Z, Ke Y, Wang Y, Cheng H, Cheng J (2012) A model-based approach to attributed graph clustering. In: Proceedings of the (2012) ACM SIGMOD international conference on management of data, New York, NY, USA, pp 505–516
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: International conference on data mining (ICDM), pp 721–724
Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. Proc VLDB Endow (PVLDB) 2(1):718–729
Google Scholar
Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: International conference on data mining (ICDM), pp 689–698

Download references

Acknowledgments

We would like to thank the anonymous referees for their constructive comments and useful suggestions. This work is partly funded by the Rhône-Alpes Complex Systems Institute (IXXI) through the project REHMI and by the French National Research Agency (ANR) through the projects FOSTER (ANR-2010-COSI-012) and BINGO2 (ANR-07-MDCO-014).

Author information

Authors and Affiliations

Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, INRIA, 69621 , Lyon, France
Pierre-Nicolas Mougel & Christophe Rigotti
Université Lyon, CNRS, Université Lyon 1, LIRIS, UMR5205, 69622 , Lyon, France
Marc Plantevit
INRIA Université Lyon, Université Lyon 1, Centre de Génétique et de Physiologie Moléculaire et Cellulaire, (CGPhiMC), CNRS, UMR5534, 69622 , Lyon, France
Olivier Gandrillon

Authors

Pierre-Nicolas Mougel
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Rigotti
View author publications
You can also search for this author in PubMed Google Scholar
Marc Plantevit
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Gandrillon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christophe Rigotti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mougel, PN., Rigotti, C., Plantevit, M. et al. Finding maximal homogeneous clique sets. Knowl Inf Syst 39, 579–608 (2014). https://doi.org/10.1007/s10115-013-0625-y

Download citation

Received: 21 May 2012
Revised: 06 December 2012
Accepted: 08 March 2013
Published: 27 March 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10115-013-0625-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding maximal homogeneous clique sets

Abstract

Access this article

Similar content being viewed by others

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Recent Advances in Graph Partitioning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finding maximal homogeneous clique sets

Abstract

Access this article

Similar content being viewed by others

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Algorithms for generating all possible spanning trees of a simple undirected connected graph: an extensive review

Recent Advances in Graph Partitioning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation