Abstract
Subgraph query processing (also known as subgraph search) and subgraph matching are fundamental graph problems in many application domains. A lot of efforts have been made to develop practical solutions for these problems. Despite the efforts, existing algorithms showed limited running time and scalability in dealing with large and/or many graphs. In this paper, we propose a new subgraph search algorithm using equivalences of vertices in order to reduce search space: (1) static equivalence of vertices in a query graph that leads to an efficient matching order of the vertices and (2) dynamic equivalence of candidate vertices in a data graph, which enables us to capture and remove redundancies in search space. These techniques for subgraph search also lead to an improved algorithm for subgraph matching. Experiments show that our approach outperforms state-of-the-art subgraph search and subgraph matching algorithms by up to several orders of magnitude with respect to query processing time.





























Similar content being viewed by others
References
Aberger, C.R., Lamb, A., Tu, S., Nötzli, A., Olukotun, K., Ré, C.: Emptyheaded: A relational engine for graph processing. ACM Trans. Datab. Syst. (TODS) 42(4), 1–44 (2017)
Bhattarai, B., Liu, H., Huang, H.H.: Ceci: compact embedding cluster index for scalable subgraph matching. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1447–1462 (2019)
Bi, F., Chang, L., Lin, X., Qin, L., Zhang, W.: Efficient subgraph matching by postponing cartesian products. In: Proceedings of ACM SIGMOD, pp. 1199–1214 (2016)
Bonnici, V., Ferro, A., Giugno, R., Pulvirenti, A., Shasha, D.: Enhancing graph database indexing by suffix tree structure. In: IAPR International Conference on Pattern Recognition in Bioinformatics, pp. 195–203. Springer, Berlin (2010)
Bonnici, V., Giugno, R., Pulvirenti, A., Shasha, D., Ferro, A.: A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinf. 14(7), 1–13 (2013)
Cannataro, M., Guzzi, P.H.: Data Management of Protein Interaction Networks, vol. 17. John Wiley and Sons, New Jersey (2012)
Carletti, V., Foggia, P., Saggese, A., Vento, M.: Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with vf3. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 804–818 (2017)
Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
Di Natale, R., Ferro, A., Giugno, R., Mongiovì, M., Pulvirenti, A., Shasha, D.: Sing: Subgraph search in non-homogeneous graphs. BMC Bioinf. 11(1), 96 (2010)
Fan, W.: Graph pattern matching revised for social network analysis. In: Proceedings of ICDT, pp. 8–21 (2012)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co. (1979)
Giugno, R., Bonnici, V., Bombieri, N., Pulvirenti, A., Ferro, A., Shasha, D.: Grapes: a software for parallel searching on biological graphs targeting multi-core architectures. PLoS ONE 8(10), e76911 (2013)
Han, M., Kim, H., Gu, G., Park, K., Han, W.S.: Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together. In: Proceedings of ACM SIGMOD, pp. 1429–1446 (2019)
Han, W.S., Lee, J., Lee, J.H.: Turbo iso: Towards Ultrafast and Robust Subgraph Isomorphism Search in Large Graph Databases. In: Proceedings of ACM SIGMOD, pp. 337–348 (2013)
Han, W.S., Lee, J., Pham, M.D., Yu, J.X.: igraph: a framework for comparisons of disk-based graph indexing techniques. Proc. VLDB Endow. 3(1–2), 449–459 (2010)
He, H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of ACM SIGMOD, pp. 405–418 (2008)
Kankanamge, C., Sahu, S., Mhedbhi, A., Chen, J., Salihoglu, S.: Graphflow: an active graph database. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1695–1698 (2017)
Katsarou, F., Ntarmos, N., Triantafillou, P.: Performance and scalability of indexed subgraph query processing methods. Proc. VLDB Endow. 8(12), 1566–1577 (2015)
Kim, H., Choi, Y., Park, K., Lin, X., Hong, S.H., Han, W.S.: Versatile equivalences: Speeding up subgraph query processing and subgraph matching. In: Proceedings of ACM SIGMOD, pp. 925–937 (2021)
Kim, J., Shin, H., Han, W.S., Hong, S., Chafi, H.: Taming subgraph isomorphism for rdf query processing. Proc. VLDB Endow. 8(11) (2015)
Kim, K., Seo, I., Han, W.S., Lee, J.H., Hong, S., Chafi, H., Shin, H., Jeong, G.: Turboflux: A fast continuous subgraph matching system for streaming graph data. In: Proceedings of ACM SIGMOD, pp. 411–426 (2018)
Klein, K., Kriege, N., Mutzel, P.: Ct-index: Fingerprint-based graph indexing combining cycles and trees. In: Proceedings of IEEE ICDE, pp. 1115–1126 (2011)
Lee, J., Han, W.S., Kasperovics, R., Lee, J.H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proc. VLDB Endow. 6(2), 133–144 (2012)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Liang, Y., Zhao, P.: Workload-aware subgraph query caching and processing in large graphs. In: Proceedings of IEEE ICDE, pp. 1754–1757 (2019)
McCreesh, C., Prosser, P., Solnon, C., Trimble, J.: When subgraph isomorphism is really hard, and why this matters for graph databases. J. Artif. Intell. Res. 61, 723–759 (2018)
McCreesh, C., Prosser, P., Trimble, J.: Heuristics and really hard instances for subgraph isomorphism problems. In: IJCAI, pp. 631–638 (2016)
McCreesh, C., Prosser, P., Trimble, J.: The glasgow subgraph solver: using constraint programming to tackle hard subgraph isomorphism problem variants. In: International Conference on Graph Transformation, pp. 316–324. Springer (2020)
Mhedhbi, A., Salihoglu, S.: Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB Endow. 12(11), 1692–1704 (2019)
Park, H., Kim, M.S.: Evograph: an effective and efficient graph upscaling method for preserving graph properties. In: Proceedings of ACM SIGKDD, pp. 2051–2059 (2018)
Pržulj, N., Corneil, D.G., Jurisica, I.: Efficient estimation of graphlet frequency distributions in protein-protein interaction networks. Bioinformatics 22(8), 974–980 (2006)
Qiao, M., Zhang, H., Cheng, H.: Subgraph matching: on compression and computation. Proc. VLDB Endow. 11(2), 176–188 (2017)
Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proc. VLDB Endow. 8(5), 617–628 (2015)
Ren, X., Wang, J.: Multi-query optimization for subgraph isomorphism search. Proc. VLDB Endow. 10(3), 121–132 (2016)
Rivero, C.R., Jamil, H.M.: Efficient and scalable labeled subgraph matching using sgmatch. Knowl. Inf. Syst. 51(1), 61–87 (2017)
Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing. Proc. VLDB Endow. 11(4), 420–431 (2017)
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow. 1(1), 364–375 (2008)
Snijders, T.A., Pattison, P.E., Robins, G.L., Handcock, M.S.: New specifications for exponential random graph models. Sociol. Methodol. 36(1), 99–153 (2006)
Sun, S., Luo, Q.: Scaling up subgraph query processing with efficient subgraph matching. In: Proceedings of IEEE ICDE, pp. 220–231 (2019)
Sun, S., Luo, Q.: In-memory subgraph matching: An in-depth study. In: Proceedings of ACM SIGMOD, pp. 1083–1098 (2020)
Sun, S., Luo, Q.: Subgraph matching with effective matching order and indexing. IEEE Transactions on Knowledge and Data Engineering (2020)
Sun, S., Sun, X., Che, Y., Luo, Q., He, B.: Rapidmatch: a holistic approach to subgraph query processing. Proc. VLDB Endow. 14(2), 176–188 (2020)
Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23(1), 31–42 (1976)
Wang, J., Ntarmos, N., Triantafillou, P.: Graphcache: a caching system for graph queries, pp. 13–24 (2017)
Wang, J., Ren, X., Anirban, S., Wu, X.W.: Correct filtering for subgraph isomorphism search in compressed vertex-labeled graphs. Inf. Sci. 482, 363–373 (2019)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proceedings of ACM SIGMOD, pp. 335–346 (2004)
Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of ACM SIGKDD, pp. 1365–1374 (2015)
Zhang, S., Li, S., Yang, J.: GADDI: Distance index based subgraph matching in biological networks. In: Proceedings of ACM EDBT, pp. 192–203 (2009)
Zhao, P., Han, J.: On graph query optimization in large networks. Proc. VLDB Endow. 3(1–2), 340–351 (2010)
Zhao, P., Yu, J.X., Philip, S.Y.: Graph indexing: Tree+ delta\(>=\) graph. In: Proceedings of VLDB, pp. 938–949 (2007)
Zou, L., Chen, L., Yu, J.X., Lu, Y.: A novel spectral coding in a large graph database. In: Proceedings of EDBT, pp. 181–192 (2008)
Acknowledgements
Hyunjoon Kim was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-01373, Artificial Intelligence Graduate School Program (Hanyang University)) and the research fund of Hanyang University (HY-202100000003161). Kunsoo Park was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00551, Framework of Practical Algorithms for NP-hard Graph Problems). Wook-Shin Han was supported by the National Research Foundation of Korea (NRF) grant (No. NRF-2021R1A2B5B03001551) and Institute of Information & communications Technology Planning & Evaluation (IITP) grant (No. 2018-0-01398).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
Seok-Hee Hong works in the same university as Alan Fekete of the editorial board. Wook-Shin Han has a conflict of interest with Kyu-Young Whang in the editorial board.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version [19] of this paper was presented at Proceedings of the 2021 International Conference on Management of Data (SIGMOD 2021)
Rights and permissions
About this article
Cite this article
Kim, H., Choi, Y., Park, K. et al. Fast subgraph query processing and subgraph matching via static and dynamic equivalences. The VLDB Journal 32, 343–368 (2023). https://doi.org/10.1007/s00778-022-00749-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-022-00749-x