Abstract
Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of the scale of the graph that they can process. Distributed computing is often proposed as a solution to improve the maximum scale. However, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs and triads in directed graphs using a distributed platform. We propose an efficient distributed solution that significantly surpasses the existing solutions on the scale and performance. With this method, we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. Our experimental results show that our solution has a strong machine scalability close to one.










Similar content being viewed by others
Notes
By modest here we mean a cluster with less than one thousand machines.
This is the name given by the authors of [30], and as far as we know is not an abbreviation.
It was not named in the original paper.
We have a correction in this algorithm from the original version, on lines 9 and 12, to enumerate all wedges properly.
Each worker is equivalent to a physical CPU core.
References
Hu, H., Yan, X., Huang, Y., Han, J., Zhou, X.J.: Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21(suppl–1), 213–221 (2005)
Milenković, T., Pržulj, N.: Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, 680 (2008)
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng. 17(8), 1036–1050 (2005)
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)
Faust, K.: A puzzle concerning triads in social networks: graph constraints and the triad census. Soc. Netw. 32(3), 221–233 (2010)
Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: Cosi: Cloud oriented subgraph identification in massive social networks. In: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 248–255. IEEE (2010)
Wong, S.W., Cercone, N., Jurisica, I.: Comparative network analysis via differential graphlet communities. Proteomics 15(2–3), 608–617 (2015)
Santoso, Y., Srinivasan, V., Thomo, A.: Efficient enumeration of four node graphlets at trillion-scale. In: Proceedings of the 23rd EDBT, pp. 439–442 (2020)
Pinar, A., Seshadhri, C., Vishal, V.: Escape: Efficiently counting all 5-vertex subgraphs. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1431–1440. International World Wide Web Conferences Steering Committee (2017)
Newman, M.E.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)
Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812 (2012)
Park, H.-M., Myaeng, S.-H., Kang, U.: Pte: enumerating trillion triangles on distributed systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1115–1124 (2016)
Hočevar, T., Demšar, J.: A combinatorial approach to graphlet counting. Bioinformatics 30(4), 559–565 (2014)
Rahman, M., Bhuiyan, M.A., Al Hasan, M.: Graft: an efficient graphlet counting method for large graph analysis. IEEE Trans. Knowl. Data Eng. 26(10), 2466–2478 (2014)
Bressan, M., Leucci, S., Panconesi, A.: Motivo: fast motif counting via succinct color coding and adaptive sampling. Proc. VLDB Endow. 12(11), 1651–1663 (2019)
McSherry, F., Isard, M., Murray, D.G.: Scalability! but at what \(\{\text{COST}\}\)? In: Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS \(\{\text{ XV }\}\)) (2015)
Park, H.-M., Silvestri, F., Pagh, R., Chung, C.-W., Myaeng, S.-H., Kang, U.: Enumerating trillion subgraphs on distributed systems. ACM Trans. Knowl. Discov. Data (TKDD) 12(6), 1–30 (2018)
Batagelj, V., Zaveršnik, M.: Short cycle connectivity. Discret. Math. 307(3–5), 310–318 (2007)
Tabak, B.M., Takami, M., Rocha, J.M., Cajueiro, D.O., Souza, S.R.: Directed clustering coefficient as a measure of systemic risk in complex banking networks. Physica A 394, 211–216 (2014)
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)
Santoso, Y., Srinivasan, V., Thomo, A., Chester, S.: Triad enumeration at trillion-scale using a single commodity machine. In: Proceedings of the 22nd EDBT (2019)
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Proceedings of the Experimental and Efficient Algorithms, 4th InternationalWorkshop, WEA 2005, Santorini Island, Greece, May 10–13, 2005, Proceedings, pp. 606–609 (2005). https://doi.org/10.1007/11427186_54
Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)
Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N.: Efficient graphlet counting for large networks. In: Proceedings of the 2015 IEEE International Conference on Data Mining. IEEE, pp. 1–10 (2015)
Bressan, M., Chierichetti, F., Kumar, R., Leucci, S., Panconesi, A.: Counting graphlets: Space vs time. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, pp. 557–566 (2017)
Wernicke, S., Rasche, F.: Fanmod: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006)
Marcus, D., Shavitt, Y.: Rage-a rapid graphlet enumerator for large networks. Comput. Netw. 56(2), 810–819 (2012)
Danisch, M., Balalau, O., Sozio, M.: Listing k-cliques in sparse real-world graphs. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 589–598 (2018)
Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of the 20th International Conference on World Wide Web. WWW ’11. ACM, New York, NY, USA, pp. 607–614 (2011). https://doi.org/10.1145/1963405.1963491
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub) graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
Teixeira, C.H., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles. ACM, pp. 425–440 (2015)
Dias, V., Teixeira, C.H., Guedes, D., Meira, W., Parthasarathy, S.: Fractal: A general-purpose graph pattern mining system. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1357–1374 (2019)
Talukder, N., Zaki, M.J.: A distributed approach for graph mining in massive networks. Data Min. Knowl. Disc. 30(5), 1024–1052 (2016)
Mawhirter, D., Reinehr, S., Holmes, C., Liu, T., Wu, B.: Graphzero: breaking symmetry for efficient graph mining. arXiv:1911.12877 (2019)
Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., Cheng, J.: G-miner: an efficient task-oriented graph mining system. In: Proceedings of the Thirteenth EuroSys Conference, pp. 1–12 (2018)
Yan, D., Guo, G., Chowdhury, M.M.R., Özsu, M.T., Ku, W.-S., Lui, J.C.: G-thinker: a distributed framework for mining subgraphs in a big graph. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, pp. 1369–1380 (2020)
Ren, X., Wang, J., Han, W.-S., Yu, J.X.: Fast and robust distributed subgraph enumeration. arXiv:1901.07747 (2019)
Zhang, H., Yu, J.X., Zhang, Y., Zhao, K., Cheng, H.: Distributed subgraph counting: a general approach. Proc. VLDB Endow. 13(12), 2493–2507 (2020)
Batagelj, V., Mrvar, A.: A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23(3), 237–243 (2001)
Chin Jr, G., Marquez, A., Choudhury, S., Feo, J.: Scalable triadic analysis of large-scale graphs: Multi-core vs. multi-processor vs. multi-threaded shared memory architectures. In: Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, pp. 163–170 (2012)
Parimalarangan, S., Slota, G.M., Madduri, K.: Fast parallel graph triad census and triangle counting on shared-memory platforms. In: Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp. 1500–1509 (2017)
Davis, J.A., Leinhardt, S.: The structure of positive interpersonal relations in small groups. Sociol. Theor. Prog. 2, 218–251 (1972)
Seshadhri, C., Pinar, A., Kolda, T.G.: Fast triangle counting through wedge sampling. Proc. SIAM Conf. Data Min. 4, 5 (2013)
Wang, P., Qi, Y., Sun, Y., Zhang, X., Tao, J., Guan, X.: Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. Proc. VLDB Endow. 11(2), 162–175 (2017)
Santoso, Y.: Triangle counting and listing in directed and undirected graphs using single machines. Master’s thesis, University of Victoria (2018)
Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp. 595–601 (2004)
Acknowledgements
We thank the anonymous reviewers for their detailed comments that helped us improve the presentation of this work substantially.
Funding
Funding was provided by NSERC Canada Discovery Grant (RGPIN-2017-04039, RGPIN-2016-04022).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Santoso, Y., Liu, X., Srinivasan, V. et al. Four node graphlet and triad enumeration on distributed platforms. Distrib Parallel Databases 40, 335–372 (2022). https://doi.org/10.1007/s10619-022-07416-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-022-07416-8