Skip to main content
Log in

Four node graphlet and triad enumeration on distributed platforms

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. By modest here we mean a cluster with less than one thousand machines.

  2. This is the name given by the authors of [30], and as far as we know is not an abbreviation.

  3. It was not named in the original paper.

  4. We have a correction in this algorithm from the original version, on lines 9 and 12, to enumerate all wedges properly.

  5. Each worker is equivalent to a physical CPU core.

  6. https://docs.computecanada.ca/wiki/Cedar.

References

  1. Hu, H., Yan, X., Huang, Y., Han, J., Zhou, X.J.: Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(suppl–1), 213–221 (2005)

    Article  Google Scholar 

  2. Milenković, T., Pržulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, 680 (2008)

    Article  Google Scholar 

  3. Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng. 17(8), 1036–1050 (2005)

    Article  Google Scholar 

  4. Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)

    Article  Google Scholar 

  5. Faust, K.: A puzzle concerning triads in social networks: graph constraints and the triad census. Soc. Netw. 32(3), 221–233 (2010)

    Article  Google Scholar 

  6. Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: Cosi: Cloud oriented subgraph identification in massive social networks. In: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 248–255. IEEE (2010)

  7. Wong, S.W., Cercone, N., Jurisica, I.: Comparative network analysis via differential graphlet communities. Proteomics 15(2–3), 608–617 (2015)

    Article  Google Scholar 

  8. Santoso, Y., Srinivasan, V., Thomo, A.: Efficient enumeration of four node graphlets at trillion-scale. In: Proceedings of the 23rd EDBT, pp. 439–442 (2020)

  9. Pinar, A., Seshadhri, C., Vishal, V.: Escape: Efficiently counting all 5-vertex subgraphs. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1431–1440. International World Wide Web Conferences Steering Committee (2017)

  10. Newman, M.E.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)

    Article  MathSciNet  Google Scholar 

  11. Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812 (2012)

    Article  Google Scholar 

  12. Park, H.-M., Myaeng, S.-H., Kang, U.: Pte: enumerating trillion triangles on distributed systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1115–1124 (2016)

  13. Hočevar, T., Demšar, J.: A combinatorial approach to graphlet counting. Bioinformatics 30(4), 559–565 (2014)

    Article  Google Scholar 

  14. Rahman, M., Bhuiyan, M.A., Al Hasan, M.: Graft: an efficient graphlet counting method for large graph analysis. IEEE Trans. Knowl. Data Eng. 26(10), 2466–2478 (2014)

    Article  Google Scholar 

  15. Bressan, M., Leucci, S., Panconesi, A.: Motivo: fast motif counting via succinct color coding and adaptive sampling. Proc. VLDB Endow. 12(11), 1651–1663 (2019)

    Article  Google Scholar 

  16. McSherry, F., Isard, M., Murray, D.G.: Scalability! but at what \(\{\text{COST}\}\)? In: Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS \(\{\text{ XV }\}\)) (2015)

  17. Park, H.-M., Silvestri, F., Pagh, R., Chung, C.-W., Myaeng, S.-H., Kang, U.: Enumerating trillion subgraphs on distributed systems. ACM Trans. Knowl. Discov. Data (TKDD) 12(6), 1–30 (2018)

    Article  Google Scholar 

  18. Batagelj, V., Zaveršnik, M.: Short cycle connectivity. Discret. Math. 307(3–5), 310–318 (2007)

    Article  MathSciNet  Google Scholar 

  19. Tabak, B.M., Takami, M., Rocha, J.M., Cajueiro, D.O., Souza, S.R.: Directed clustering coefficient as a measure of systemic risk in complex banking networks. Physica A 394, 211–216 (2014)

    Article  Google Scholar 

  20. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)

    Book  Google Scholar 

  21. Santoso, Y., Srinivasan, V., Thomo, A., Chester, S.: Triad enumeration at trillion-scale using a single commodity machine. In: Proceedings of the 22nd EDBT (2019)

  22. Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Proceedings of the Experimental and Efficient Algorithms, 4th InternationalWorkshop, WEA 2005, Santorini Island, Greece, May 10–13, 2005, Proceedings, pp. 606–609 (2005). https://doi.org/10.1007/11427186_54

  23. Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)

    Article  MathSciNet  Google Scholar 

  24. Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N.: Efficient graphlet counting for large networks. In: Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, pp. 1–10 (2015)

  25. Bressan, M., Chierichetti, F., Kumar, R., Leucci, S., Panconesi, A.: Counting graphlets: Space vs time. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, pp. 557–566 (2017)

  26. Wernicke, S., Rasche, F.: Fanmod: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006)

    Article  Google Scholar 

  27. Marcus, D., Shavitt, Y.: Rage-a rapid graphlet enumerator for large networks. Comput. Netw. 56(2), 810–819 (2012)

    Article  Google Scholar 

  28. Danisch, M., Balalau, O., Sozio, M.: Listing k-cliques in sparse real-world graphs. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 589–598 (2018)

  29. Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the 20th International Conference on World Wide Web. WWW ’11. ACM, New York, NY, USA, pp. 607–614 (2011). https://doi.org/10.1145/1963405.1963491

  30. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)

    Article  Google Scholar 

  31. Teixeira, C.H., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles. ACM, pp. 425–440 (2015)

  32. Dias, V., Teixeira, C.H., Guedes, D., Meira, W., Parthasarathy, S.: Fractal: A general-purpose graph pattern mining system. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1357–1374 (2019)

  33. Talukder, N., Zaki, M.J.: A distributed approach for graph mining in massive networks. Data Min. Knowl. Disc. 30(5), 1024–1052 (2016)

    Article  MathSciNet  Google Scholar 

  34. Mawhirter, D., Reinehr, S., Holmes, C., Liu, T., Wu, B.: Graphzero: breaking symmetry for efficient graph mining. arXiv:1911.12877 (2019)

  35. Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Proceedings of the Thirteenth EuroSys Conference, pp. 1–12 (2018)

  36. Yan, D., Guo, G., Chowdhury, M.M.R., Özsu, M.T., Ku, W.-S., Lui, J.C.: G-thinker: a distributed framework for mining subgraphs in a big graph. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, pp. 1369–1380 (2020)

  37. Ren, X., Wang, J., Han, W.-S., Yu, J.X.: Fast and robust distributed subgraph enumeration. arXiv:1901.07747 (2019)

  38. Zhang, H., Yu, J.X., Zhang, Y., Zhao, K., Cheng, H.: Distributed subgraph counting: a general approach. Proc. VLDB Endow. 13(12), 2493–2507 (2020)

    Article  Google Scholar 

  39. Batagelj, V., Mrvar, A.: A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23(3), 237–243 (2001)

    Article  Google Scholar 

  40. Chin Jr, G., Marquez, A., Choudhury, S., Feo, J.: Scalable triadic analysis of large-scale graphs: Multi-core vs. multi-processor vs. multi-threaded shared memory architectures. In: Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, pp. 163–170 (2012)

  41. Parimalarangan, S., Slota, G.M., Madduri, K.: Fast parallel graph triad census and triangle counting on shared-memory platforms. In: Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp. 1500–1509 (2017)

  42. Davis, J.A., Leinhardt, S.: The structure of positive interpersonal relations in small groups. Sociol. Theor. Prog. 2, 218–251 (1972)

    Google Scholar 

  43. Seshadhri, C., Pinar, A., Kolda, T.G.: Fast triangle counting through wedge sampling. Proc. SIAM Conf. Data Min. 4, 5 (2013)

    Google Scholar 

  44. Wang, P., Qi, Y., Sun, Y., Zhang, X., Tao, J., Guan, X.: Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. Proc. VLDB Endow. 11(2), 162–175 (2017)

    Article  Google Scholar 

  45. Santoso, Y.: Triangle counting and listing in directed and undirected graphs using single machines. Master’s thesis, University of Victoria (2018)

  46. Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp. 595–601 (2004)

Download references

Acknowledgements

We thank the anonymous reviewers for their detailed comments that helped us improve the presentation of this work substantially.

Funding

Funding was provided by NSERC Canada Discovery Grant (RGPIN-2017-04039, RGPIN-2016-04022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yudi Santoso.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santoso, Y., Liu, X., Srinivasan, V. et al. Four node graphlet and triad enumeration on distributed platforms. Distrib Parallel Databases 40, 335–372 (2022). https://doi.org/10.1007/s10619-022-07416-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-022-07416-8

Keywords

Navigation