skip to main content
research-article

Circinus: Fast Redundancy-Reduced Subgraph Matching

Published:30 May 2023Publication History
Skip Abstract Section

Abstract

Subgraph matching is one of the most important problems in graph analytics. Many algorithms and systems have been proposed for subgraph matching. Most of these works follow Ullmann's backtracking approach as it is memory-efficient in handling an explosive number of intermediate matching results. However, they have largely overlooked an intrinsic problem of backtracking, namely repeated computation, which contributes to a large portion of the heavy computation in subgraph matching. This paper proposes a subgraph matching system, Circinus, which enables effective computation sharing by a new compression-based backtracking method. Our extensive experiments show that Circinus significantly reduces repeated computation, which transfers to up to several orders of magnitude performance improvement.

Skip Supplemental Material Section

Supplemental Material

PACMMOD-V1mod012.mp4.mp4

mp4

168.3 MB

References

  1. Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed Evaluation of Subgraph Queries Using Worst-case Optimal and Low-Memory Dataflows. Proc. VLDB Endow., Vol. 11, 6 (2018), 691--704. https://doi.org/10.14778/3184470.3184473Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1447--1462. https://doi.org/10.1145/3299869.3300086Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient Subgraph Matching by Postponing Cartesian Products. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1199--1214. https://doi.org/10.1145/2882903.2915236Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis E. Shasha, and Alfredo Ferro. 2013. A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform., Vol. 14, S-7 (2013), S13. https://doi.org/10.1186/1471--2105--14-S7-S13Google ScholarGoogle ScholarCross RefCross Ref
  5. Badrish Chandramouli, Jonathan Goldstein, and David Maier. 2010. High-Performance Dynamic Pattern Matching over Disordered Streams. Proc. VLDB Endow., Vol. 3, 1 (2010), 220--231. https://doi.org/10.14778/1920841.1920873Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hongzhi Chen, Changji Li, Juncheng Fang, Chenghuan Huang, James Cheng, Jian Zhang, Yifan Hou, and Xiao Yan. 2019. Grasper: A High Performance Distributed System for OLAP on Property Graphs. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, Santa Cruz, CA, USA, November 20--23, 2019. ACM, 87--100. https://doi.org/10.1145/3357223.3362715Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, and James Cheng. 2018. G-Miner: an efficient task-oriented graph mining system. In Proceedings of the Thirteenth EuroSys Conference, EuroSys 2018, Porto, Portugal, April 23--26, 2019, Rui Oliveira, Pascal Felber, and Y. Charlie Hu (Eds.). ACM, 32:1--32:12. https://doi.org/10.1145/3190508.3190545Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU. Proc. VLDB Endow., Vol. 13, 8 (2020), 1190--1205. https://doi.org/10.14778/3389133.3389137Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stephen A. Cook. 1971. The Complexity of Theorem-Proving Procedures. In Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, May 3--5, 1971, Shaker Heights, Ohio, USA, Michael A. Harrison, Ranan B. Banerji, and Jeffrey D. Ullman (Eds.). ACM, 151--158. https://doi.org/10.1145/800157.805047Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2019. TigerGraph: A Native MPP Graph Database. CoRR, Vol. abs/1901.08248 (2019). showeprint[arXiv]1901.08248 http://arxiv.org/abs/1901.08248Google ScholarGoogle Scholar
  11. Vin'i cius Vitor dos Santos Dias, Carlos H. C. Teixeira, Dorgival O. Guedes, Wagner Meira Jr., and Srinivasan Parthasarathy. 2019. Fractal: A General-Purpose Graph Pattern Mining System. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1357--1374. https://doi.org/10.1145/3299869.3319875Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wentian Guo, Yuchen Li, Mo Sha, Bingsheng He, Xiaokui Xiao, and Kian-Lee Tan. 2020b. GPU-Accelerated Subgraph Enumeration on Partitioned Graphs. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1067--1082. https://doi.org/10.1145/3318464.3389699Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wentian Guo, Yuchen Li, and Kian-Lee Tan. 2020a. Exploiting Reuse for GPU Subgraph Enumeration. IEEE Transactions on Knowledge and Data Engineering (2020), 1--1. https://doi.org/10.1109/TKDE.2020.3035564Google ScholarGoogle ScholarCross RefCross Ref
  14. Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1429--1446. https://doi.org/10.1145/3299869.3319880Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo(_iso ): towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013, Kenneth A. Ross, Divesh Srivastava, and Dimitris Papadias (Eds.). ACM, 337--348. https://doi.org/10.1145/2463676.2465300Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Huahai He and Ambuj K. Singh. 2008. Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10--12, 2009, Jason Tsong-Li Wang (Ed.). ACM, 405--418. https://doi.org/10.1145/1376616.1376660Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. 2020. Peregrine: a pattern-aware graph mining system. In EuroSys '20: Fifteenth EuroSys Conference 2020, Heraklion, Greece, April 27--30, 2020, Angelos Bilas, Kostas Magoutis, Evangelos P. Markatos, Dejan Kostic, and Margo I. Seltzer (Eds.). ACM, 13:1--13:16. https://doi.org/10.1145/3342195.3387548Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alpá r Jü ttner and Pé ter Madarasi. 2018. VF2 - An improved subgraph isomorphism algorithm. Discret. Appl. Math., Vol. 242 (2018), 69--81. https://doi.org/10.1016/j.dam.2018.02.018Google ScholarGoogle ScholarCross RefCross Ref
  19. Chathura Kankanamge, Siddhartha Sahu, Amine Mhedhbi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An Active Graph Database. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, 1695--1698. https://doi.org/10.1145/3035918.3056445Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hyunjoon Kim, Yunyoung Choi, Kunsoo Park, Xuemin Lin, Seok-Hee Hong, and Wook-Shin Han. 2021. Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 925--937. https://doi.org/10.1145/3448016.3457265Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Longbin Lai, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang. 2016. Scalable Distributed Subgraph Enumeration. Proc. VLDB Endow., Vol. 10, 3 (2016), 217--228. https://doi.org/10.14778/3021924.3021937Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  23. Daniel Mawhirter, Sam Reinehr, Connor Holmes, Tongping Liu, and Bo Wu. 2021. GraphZero: A High-Performance Subgraph Matching System. ACM SIGOPS Oper. Syst. Rev., Vol. 55, 1 (2021), 21--37. https://doi.org/10.1145/3469379.3469383Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniel Mawhirter and Bo Wu. 2019. AutoMine: harmonizing high-level abstraction and high performance for graph mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27--30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 509--523. https://doi.org/10.1145/3341301.3359633Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow., Vol. 12, 11 (July 2019), 1692--1704. https://doi.org/10.14778/3342263.3342643Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2013. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., Vol. 42, 4 (2013), 5--16. https://doi.org/10.1145/2590989.2590991Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Prvzulj, D. G. Corneil, and I. Jurisica. 2006. Efficient Estimation of Graphlet Frequency Distributions in Protein--Protein Interaction Networks. Bioinformatics, Vol. 22, 8 (April 2006), 974--980. https://doi.org/10.1093/bioinformatics/btl030Google ScholarGoogle Scholar
  28. Miao Qiao, Hao Zhang, and Hong Cheng. 2017. Subgraph Matching: On Compression and Computation. Proc. VLDB Endow., Vol. 11, 2 (Oct. 2017), 176--188. https://doi.org/10.14778/3149193.3149198Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. RedisLabs. 2021. RedisGraph - a graph database module for Redis. https://oss.redislabs.com/redisgraph/Google ScholarGoogle Scholar
  30. Xuguang Ren and Junhu Wang. 2015. Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs. Proc. VLDB Endow., Vol. 8, 5 (Jan. 2015), 617--628. https://doi.org/10.14778/2735479.2735493Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Carlos R. Rivero and Hasan M. Jamil. 2017. Efficient and scalable labeled subgraph matching using SGMatch. Knowl. Inf. Syst., Vol. 51, 1 (2017), 61--87. https://doi.org/10.1007/s10115-016-0968--2Google ScholarGoogle ScholarCross RefCross Ref
  32. Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Ö zsu. 2017. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Proc. VLDB Endow., Vol. 11, 4 (2017), 420--431. https://doi.org/10.1145/3186728.3164139Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism. Proc. VLDB Endow., Vol. 1, 1 (Aug. 2008), 364--375. https://doi.org/10.14778/1453856.1453899Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tianhui Shi, Mingshu Zhai, Yi Xu, and Jidong Zhai. 2020. GraphPi: high performance graph pattern matching through effective redundancy elimination. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9--19, 2020, Christine Cuicchi, Irene Qualters, and William T. Kramer (Eds.). IEEE/ACM, 100. https://doi.org/10.1109/SC41405.2020.00104Google ScholarGoogle ScholarCross RefCross Ref
  35. Tom A. B. Snijders, Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock. 2006. New Specifications for Exponential Random Graph Models. Sociological Methodology, Vol. 36, 1 (2006), 99--153. https://doi.org/10.1111/j.1467--9531.2006.00176.xGoogle ScholarGoogle ScholarCross RefCross Ref
  36. Shixuan Sun and Qiong Luo. 2020. In-Memory Subgraph Matching: An In-Depth Study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1083--1098. https://doi.org/10.1145/3318464.3380581Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. The Neo4J Team. 2021. Neo4J. https://neo4j.com/Google ScholarGoogle Scholar
  38. Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: a system for distributed graph mining. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4--7, 2015, Ethan L. Miller and Steven Hand (Eds.). ACM, 425--440. https://doi.org/10.1145/2815400.2815410Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Vasileios Trigonakis, Jean-Pierre Lozi, Tomá s Falt'i n, Nicholas P. Roth, Iraklis Psaroudakis, Arnaud Delamare, Vlad Haprian, Calin Iorgulescu, Petr Koupy, Jinsoo Lee, Sungpack Hong, and Hassan Chafi. 2021. aDFS: An Almost Depth-First-Search Distributed Graph-Querying System. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14--16, 2021, Irina Calciu and Geoff Kuenning (Eds.). USENIX Association, 209--224. https://www.usenix.org/conference/atc21/presentation/trigonakisGoogle ScholarGoogle Scholar
  40. J. R. Ullmann. 1976. An Algorithm for Subgraph Isomorphism. J. ACM, Vol. 23, 1 (Jan. 1976), 31--42. https://doi.org/10.1145/321921.321925Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Todd L. Veldhuizen. 2012. Leapfrog Triejoin: a worst-case optimal join algorithm. CoRR, Vol. abs/1210.0481 (2012). arxiv: 1210.0481 http://arxiv.org/abs/1210.0481Google ScholarGoogle Scholar
  42. Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, and Guoqing Harry Xu. 2018. RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8--10, 2019, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 763--782. https://www.usenix.org/conference/osdi18/presentation/wangGoogle ScholarGoogle Scholar
  43. Wikipedia contributors. 2021. Box plot -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Box_plot&oldid=1059408900 [Online; accessed 11-December-2021].Google ScholarGoogle Scholar
  44. Da Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M. Tamer Ö zsu, Wei-Shinn Ku, and John C. S. Lui. 2020. G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20--24, 2020. IEEE, 1369--1380. https://doi.org/10.1109/ICDE48307.2020.00122Google ScholarGoogle Scholar
  45. Xifeng Yan, Philip S. Yu, and Jiawei Han. 2004. Graph Indexing: A Frequent Structure-Based Approach. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (Paris, France) (SIGMOD '04). Association for Computing Machinery, New York, NY, USA, 335--346. https://doi.org/10.1145/1007568.1007607Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhengyi Yang, Longbin Lai, Xuemin Lin, Kongzhang Hao, and Wenjie Zhang. 2021. HUGE: An Efficient and Scalable Subgraph Enumeration System. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2049--2062. https://doi.org/10.1145/3448016.3457237Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Peixiang Zhao and Jiawei Han. 2010. On Graph Query Optimization in Large Networks. Proc. VLDB Endow., Vol. 3, 1 (2010), 340--351. https://doi.org/10.14778/1920841.1920887Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Circinus: Fast Redundancy-Reduced Subgraph Matching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Management of Data
        Proceedings of the ACM on Management of Data  Volume 1, Issue 1
        PACMMOD
        May 2023
        2807 pages
        EISSN:2836-6573
        DOI:10.1145/3603164
        Issue’s Table of Contents

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 May 2023
        Published in pacmmod Volume 1, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader