Listing all maximal cliques in large graphs on vertex-centric model

Brighen, Assia; Slimani, Hachem; Rezgui, Abdelmounaam; Kheddouci, Hamamache

doi:10.1007/s11227-019-02770-4

Listing all maximal cliques in large graphs on vertex-centric model

Published: 12 February 2019

Volume 75, pages 4918–4946, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Assia Brighen ORCID: orcid.org/0000-0002-7361-9504¹,
Hachem Slimani¹,
Abdelmounaam Rezgui² &
…
Hamamache Kheddouci³

513 Accesses
5 Citations
Explore all metrics

Abstract

Maximal Clique Enumeration (MCE), which consists to enumerate all maximal complete subgraphs in a given graph, is a fundamental problem in graph theory, and it is used in several applications. It is one of Karp’s 21 NP-complete problems. In the literature, this problem has been widely studied. One of the most notable, efficient, successful and extensively used solutions is the Bron–Kerbosch (BK) algorithm. The latter is a sequential algorithm which is able to enumerate all maximal cliques in an undirected graph without duplication. Furthermore, it is used to solve large problems such as maximum clique problem, community detection and graph clustering. However, for large graphs, sequential algorithms are slow and do not scale well. Thus, processing efficiently this kind of graphs needs to develop distributed algorithms under parallel and distributed platforms or large graph mining frameworks. In this setting, we propose new efficient distributed algorithms for maximal clique enumerating based on the vertex-centric model. These algorithms use the BK algorithm principle to deal with the MCE problem on large cluster. The proposed algorithms are implemented in Giraph and evaluated by using real-world graphs and computer-generated benchmark networks. Our experiments on a Hadoop cluster show that the proposed algorithms can effectively process a variety of large real-world and computer-generated graphs and scale well with increasing the dataset size and the number of nodes in the cluster. Furthermore, the proposed algorithms are provably work-efficient compared with other algorithms including BK algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Maximal Clique Enumeration Over Graph Data

Article Open access 07 February 2017

A novel parallel local search algorithm for the maximum vertex weight clique problem in large graphs

Article 11 June 2019

Parallel mining of large maximal quasi-cliques

Article 26 November 2021

References

Akkoyunlu EA (1973) The enumeration of maximal cliques of large graphs. SIAM J Comput 2(1):1–6. https://doi.org/10.1137/0202001
Article MathSciNet MATH Google Scholar
Avery C, Kunz C (2011) Giraph: large-scale graph processing infrastructure on Hadoop. In: Proceedings of the 2011 Hadoop Summit, Santa Clara
Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577. https://doi.org/10.1145/362342.362367
Article MATH Google Scholar
Butenko S, Wilhelm WE (2006) Clique-detection models in computational biochemistry and genomics. Eur J Oper Res 173(1):1–17. https://doi.org/10.1016/j.ejor.2005.05.026
Article MathSciNet MATH Google Scholar
Chen Q, Fang Ch, Wang Z, Suo B, Li Z, Ives ZG (2016) Parallelizing maximal clique enumeration over graph data. In: DASFAA’2016 Proceedings, Part II, of the 21st International Conference on Database Systems for Advanced Applications, vol 9643, pp 249–264. https://doi.org/10.1007/978-3-319-32049-6_16
Cheng J, Zhu L, Ke Y, Chu S (2012) Fast algorithms for maximal clique enumeration with limited memory. In: KDD’12 Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1240-1248. https://doi.org/10.1145/2339530.2339724
Chiba N, Nishizeki T (1985) Arboricity and subgraph listing algorithms. SIAM J Comput 14(1):210–223. https://doi.org/10.1137/0214017
Article MathSciNet MATH Google Scholar
Ching A, Edunov S, Kabiljo M, Logothetis D, Muthukrishnan S (2015) One trillion edges: graph processing at facebook-scale. In: Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii vol 8(12), pp 1804–1815. https://doi.org/10.14778/2824032.2824077
Conte A, Virgilio RD, Maccioni A, Patrignani M, Torlone R (2016) Finding all maximal cliques in very large social networks. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, pp 173–184. https://doi.org/10.5441/002/edbt.2016.18
Dasari NS, Ranjan D, Zubair M (2014) Maximal clique enumeration for large graphs on hadoop framework. In: PPAA’14 Proceedings of the First Workshop on Parallel Programming for Analytics Applications pp 21–30. https://doi.org/10.1145/2567634.2567640
Dasari NS, Zubair M, Ranjan D (2013) A novel parallel algorithm for maximal clique enumeration on multicore and distributed memory architectures. https://pdfs.semanticscholar.org/9827/9e2cedb14085886fcb4473f1ba483a3df195.pdf
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI’04, The 6th Symposium on Operating System Design and Implementation, vol 6, California, USA. pp 137–150
Doekemeijer N, Varbanescu AL (2014) A Survey of parallel graph processing frameworks. Technical report, Delft University of Technology, Report number PDS-2014-003
Du N, Bin W, Liutong X, Bai W, Xin P (2006) A parallel algorithm for enumerating all maximal cliques in complex network. In: Proceedings of the Sixth IEEE International Conference on Data Mining—Workshops, Hong Kong, China, pp 320–324. https://doi.org/10.1109/ICDMW.2006.17
Elshawi R, Batarfi O, Fayoumi A, Barnawi A, Sakr S (2015) Big graph processing systems: state-of-the-art and open challenges. In: Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on Big Data Computing Service and Applications, pp 24–33. https://doi.org/10.1109/BigDataService.2015.11
Elser B, Montresor A (2013) An evaluation study of bigdata frameworks for graph processing. In: 2013 IEEE International Conference on Big Data, pp 60–67. https://doi.org/10.1109/BigData.2013.6691555
Eppstein D, Loffler M, Strash D (2010) Listing all maximal cliques in sparse graphs in near-optimal time. In: Cheong O, Chwa KY, Park K (eds) Algorithms and Computation. ISAAC 2010. Lecture Notes in Computer Science, vol 6506, Springer, Berlin, Heidelberg. pp 403–414. https://doi.org/10.1007/978-3-642-17517-6_36
Fehér P, Asztalos M, Vajk T, Mészàros T, Lengyel L (2017) Detecting subgraph isomorphism with MapReduce. J Supercomput 73(5):1810–1851. https://doi.org/10.1007/s11227-016-1885-6
Article Google Scholar
Giraph A Apache giraph!. https://giraph.apache.org/. Accessed 17 Feb 2018
Golumbic MC (1980) Algorithmic graph theory and perfect graphs. Academic Press, New York
MATH Google Scholar
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: OSDI’12 Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, CA, USA, pp 17–30
Guo Y, Biczak M, Varbanescu AL, Iosup A, Martella C, Willke TL (2014) How well do graph-processing platforms perform? An empirical performance evaluation and analysis. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp 395–404. https://doi.org/10.1109/IPDPS.2014.49
Guo Y, Varbanescu AL, Iosup A, Martella C, Willke TL (2014) Benchmarking graph-processing platforms: a vision. In: ICPE’14 Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, pp 289–292. https://doi.org/10.1145/2568088.2576761
Hadoop A Hadoop. http://hadoop.apache.org. Accessed 17 Feb 2018
Han M, Daudjee K (2015) Giraph unchained: barrierless asynchronous parallel execution in pregel like graph processing systems. In: Proceedings of the VLDB Endowment, vol 8(9), pp 950–961. https://doi.org/10.14778/2777598.2777604
Harary F, Ross IC (1957) A procedure for clique detection using the group matrix. Sociometry 20(3):205–215
Article MathSciNet Google Scholar
Harley E, Bonner A, Goodman N (2001) Uniform integration of genome mapping data using intersection graphs. Bioinformatics 17(6):487–494. https://doi.org/10.1093/bioinformatics/17.6.487
Article Google Scholar
Horaud R, Skordas T (1989) Stereo correspondence through feature grouping and maximal cliques. IEEE Trans Pattern Anal Mach Intell 11(11):1168–1180. https://doi.org/10.1109/34.42855
Article Google Scholar
Hou R, Wang C, Zhu Q, Li J (2014) Interference-aware QoS multicast routing for smart grid. Ad Hoc Netw 22:13–26. https://doi.org/10.1016/j.adhoc.2014.05.008
Article Google Scholar
Kaalia R, Srinivasan A, Kumar A, Ghosh I (2016) ILP-assisted de novo drug design. Mach Learn 103(3):309–341. https://doi.org/10.1007/s10994-016-5556-x
Article MathSciNet Google Scholar
Kajdanowicz T, Kazienko P, Indyk W (2014) Parallel processing of large graphs. Future Gener Comput Syst 32:324–337. https://doi.org/10.1016/j.future.2013.08.007
Article Google Scholar
Kalavri V, Vlassov V, Haridi S (2018) High-level programming abstractions for distributed graph processing. IEEE Trans Knowl Data Eng 30(2):305–324. https://doi.org/10.1109/TKDE.2017.2762294
Article Google Scholar
Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds) Complexity of computer computations. Plenum Press, New York, pp 85–104
Chapter Google Scholar
Koichi S, Arisaka M, Koshino H, Aoki A, Iwata S, Uno T, Satoh H (2014) Chemical structure elucidation from 13C NMR chemical shifts: efficient data processing using bipartite matching and maximal clique algorithms. J Chem Inf Model 54(4):1027–1035. https://doi.org/10.1021/ci400601c
Article Google Scholar
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110
Article Google Scholar
Leskovec J, Krevl A (2014) SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data
Liu HF, Su CT, Chu AC (2013) Fast quasi-biclique mining with giraph. In: BIGDATACONGRESS’13 Proceedings of the 2013 IEEE International Congress on Big Data, pp 347–354. https://doi.org/10.1109/BigData.Congress.2013.53
Lu L, Gu Y, Grossman R (2010) dMaximalCliques: a distributed algorithm for enumerating all maximal cliques and maximal clique distribution. In: ICDMW’10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, pp 1320–1327. https://doi.org/10.1109/ICDMW.2010.13
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Dstributed GraphLab: a framework for machine learning and data mining in the Cloud. In: Proceedings of the VLDB Endowment vol 5(8), pp 716–727. https://doi.org/10.14778/2212351.2212354
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: SIGMOD’10 Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 135–146, Indiana, USA. https://doi.org/10.1145/1807167.1807184
Martella C, Shaposhnik R, Logothetis D (2015) Practical graph analytics with apache giraph. Apress, Berkely
Book Google Scholar
Molzahn DK, Holzer JT, Lesieutre BC, DeMarco CL (2013) Implementation of a large-scale optimal power flow solver based on semidefinite programming. IEEE Trans Power Syst 28(4):3987–3998. https://doi.org/10.1109/TPWRS.2013.2258044
Article Google Scholar
Mukherjee AP, Tirthapura S (2014) Enumerating maximal bicliques from a large graph using MapReduce. In: 2014 IEEE International Congress on Big Data, pp 707–716. https://doi.org/10.1109/BigData.Congress.2014.105
Pan L, Santos EE (2008) An anytime-anywhere approach for maximal clique enumeration in social network analysis. In: 2008 IEEE International Conference on Systems, Man and Cybernetics, pp 3529–3535. https://doi.org/10.1109/ICSMC.2008.4811845
Prosser P (2012) Exact algorithms for maximum clique: a computational study. Algorithms 5(4):545–587. https://doi.org/10.3390/a5040545
Article MathSciNet MATH Google Scholar
Sakr S (2013) Processing large-scale graph data: A guide to current technology. IBM Developerworks
Sakr S, Orakzai FM, Abdelaziz I, Khayyat Z (2016) Large-Scale graph processing using Apache Giraph. Springer. https://doi.org/10.1007/978-3-319-47431-1
Salem S, Ozcaglar C (2013) MFMS: Maximal frequent module set mining from multiple human gene expression data sets. In: Proceedings of the 12th International Workshop on Data Mining in Bioinformatics, pp 51–57. https://doi.org/10.1145/2500863.2500869
Schmidt M, Samatova N, Thomas K, Park B (2009) A scalable, parallel algorithm for maximal clique enumeration. J Parallel Distrib Comput 69(4):417–428. https://doi.org/10.1016/j.jpdc.2009.01.003
Article Google Scholar
Shrawak P, Kagzi T, Singh AP, Dobariya B, Lokhande P, Alhat BR (2017) Robotic algorithm development. IJCSIT 8(1):116–119
Google Scholar
Spark A (2018) Lightning-fast unified analytics engine. https://spark.apache.org/. Accessed 18 Nov 2018
Svendsen M, Mukherjee AP, Tirthapura S (2015) Mining maximal cliques from a large graph using MapReduce: tackling highly uneven subproblem sizes. J Parallel Distrib Comput 79–80:104–114. https://doi.org/10.1016/j.jpdc.2014.08.011
Article Google Scholar
Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “Think Like aVertex” to “Think Like a Graph”. In: Proceedings of the VLDB Endowment, vol 7(3), pp 193–204. https://doi.org/10.14778/2732232.2732238
Tomita E, Akutsu T, Matsunaga T (2011) Efficient algorithms for finding maximum and maximal cliques: effective tools for bioinformatics. Biomed Eng Trends Electron Commun Softw. https://doi.org/10.5772/13245
Article Google Scholar
Tomita E, Tanakaa A, Takahashia H (2006) The worst-case time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci 363(1):28–42. https://doi.org/10.1016/j.tcs.2006.06.015
Article MathSciNet Google Scholar
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111. https://doi.org/10.1145/79173.79181
Article Google Scholar
Vlaic S, Conrad T, Tokarski-Schnelle C, Gustafsson M, Dahmen U, Guthke R, Schuster S (2018) ModuleDiscoverer: identification of regulatory modules in protein–protein interaction networks. Sci Rep 8(1):1–11. https://doi.org/10.1038/s41598-017-18370-2
Article Google Scholar
Wu B, Yang S, Zhao H, Wang B (2009) A distributed algorithm to enumerate all maximal cliques in mapreduce. In: Proceedings of the Fourth International Conference on Frontier of Computer Science and Technology, pp 45–51. https://doi.org/10.1109/FCST.2009.30
Xin RS, Crankshaw D, Dave A, Gonzalez JE, Franklin MJ, Stoica I (2014) GraphX: Unifying data-parallel and graph-parallel analytics. In arXiv preprint arXiv:1402.2394
Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) GraphX: a resilient distributed graph system on spark. In: GRADES’13 First International Workshop on Graph Data Management Experiences and Systems Article No. 2, New York, USA. https://doi.org/10.1145/2484425.2484427
Xu Y, Cheng J, Fu AW, Bu Y (2014) Distributed maximal clique computation. In: BIGDATACONGRESS’14 Proceedings of the 2014 IEEE International Congress on Big Data, pp 160–167. https://doi.org/10.1109/BigData.Congress.2014.31
Xu Y, Cheng J, Fu AW (2016) Distributed maximal clique computation and management. IEEE Trans Serv Comput 9(1):110–122. https://doi.org/10.1109/TSC.2015.2479225
Article Google Scholar
Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K (2014) Fast iterative graph computation: a path centric approach. In: SC’14 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 401–412. https://doi.org/10.1109/SC.2014.38
Zhang Y, Ren J, Liu J, Xu C, Guo H, Liu Y (2017) A survey on emerging computing paradigms for big data. CJE 26(1):1–12. https://doi.org/10.1049/cje.2016.11.016
Article Google Scholar

Download references

Acknowledgement

The authors are grateful to the anonymous referees for their valuable suggestions and comments which have helped to improve the quality of the paper and its presentation.

Author information

Authors and Affiliations

LIMED Laboratory, Computer Science Department, University of Bejaia, 06000, Bejaia, Algeria
Assia Brighen & Hachem Slimani
School of Information Technology Illinois State University, Campus Box 5150, Normal, IL, 61790-5150, USA
Abdelmounaam Rezgui
CNRS, Université Lyon 1, LIRIS, UMR5205, Université de Lyon, 69622, Lyon, France
Hamamache Kheddouci

Authors

Assia Brighen
View author publications
You can also search for this author in PubMed Google Scholar
Hachem Slimani
View author publications
You can also search for this author in PubMed Google Scholar
Abdelmounaam Rezgui
View author publications
You can also search for this author in PubMed Google Scholar
Hamamache Kheddouci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Assia Brighen.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brighen, A., Slimani, H., Rezgui, A. et al. Listing all maximal cliques in large graphs on vertex-centric model. J Supercomput 75, 4918–4946 (2019). https://doi.org/10.1007/s11227-019-02770-4

Download citation

Published: 12 February 2019
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s11227-019-02770-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Listing all maximal cliques in large graphs on vertex-centric model

Abstract

Access this article

Similar content being viewed by others

Efficient Maximal Clique Enumeration Over Graph Data

A novel parallel local search algorithm for the maximum vertex weight clique problem in large graphs

Parallel mining of large maximal quasi-cliques

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Listing all maximal cliques in large graphs on vertex-centric model

Abstract

Access this article

Similar content being viewed by others

Efficient Maximal Clique Enumeration Over Graph Data

A novel parallel local search algorithm for the maximum vertex weight clique problem in large graphs

Parallel mining of large maximal quasi-cliques

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation