Abstract
Subgraph matching is one of the most important problems in graph analytics. Many algorithms and systems have been proposed for subgraph matching. Most of these works follow Ullmann's backtracking approach as it is memory-efficient in handling an explosive number of intermediate matching results. However, they have largely overlooked an intrinsic problem of backtracking, namely repeated computation, which contributes to a large portion of the heavy computation in subgraph matching. This paper proposes a subgraph matching system, Circinus, which enables effective computation sharing by a new compression-based backtracking method. Our extensive experiments show that Circinus significantly reduces repeated computation, which transfers to up to several orders of magnitude performance improvement.
Supplemental Material
- Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed Evaluation of Subgraph Queries Using Worst-case Optimal and Low-Memory Dataflows. Proc. VLDB Endow., Vol. 11, 6 (2018), 691--704. https://doi.org/10.14778/3184470.3184473Google ScholarDigital Library
- Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1447--1462. https://doi.org/10.1145/3299869.3300086Google ScholarDigital Library
- Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient Subgraph Matching by Postponing Cartesian Products. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1199--1214. https://doi.org/10.1145/2882903.2915236Google ScholarDigital Library
- Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis E. Shasha, and Alfredo Ferro. 2013. A subgraph isomorphism algorithm and its application to biochemical data. BMC Bioinform., Vol. 14, S-7 (2013), S13. https://doi.org/10.1186/1471--2105--14-S7-S13Google ScholarCross Ref
- Badrish Chandramouli, Jonathan Goldstein, and David Maier. 2010. High-Performance Dynamic Pattern Matching over Disordered Streams. Proc. VLDB Endow., Vol. 3, 1 (2010), 220--231. https://doi.org/10.14778/1920841.1920873Google ScholarDigital Library
- Hongzhi Chen, Changji Li, Juncheng Fang, Chenghuan Huang, James Cheng, Jian Zhang, Yifan Hou, and Xiao Yan. 2019. Grasper: A High Performance Distributed System for OLAP on Property Graphs. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, Santa Cruz, CA, USA, November 20--23, 2019. ACM, 87--100. https://doi.org/10.1145/3357223.3362715Google ScholarDigital Library
- Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, and James Cheng. 2018. G-Miner: an efficient task-oriented graph mining system. In Proceedings of the Thirteenth EuroSys Conference, EuroSys 2018, Porto, Portugal, April 23--26, 2019, Rui Oliveira, Pascal Felber, and Y. Charlie Hu (Eds.). ACM, 32:1--32:12. https://doi.org/10.1145/3190508.3190545Google ScholarDigital Library
- Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU. Proc. VLDB Endow., Vol. 13, 8 (2020), 1190--1205. https://doi.org/10.14778/3389133.3389137Google ScholarDigital Library
- Stephen A. Cook. 1971. The Complexity of Theorem-Proving Procedures. In Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, May 3--5, 1971, Shaker Heights, Ohio, USA, Michael A. Harrison, Ranan B. Banerji, and Jeffrey D. Ullman (Eds.). ACM, 151--158. https://doi.org/10.1145/800157.805047Google ScholarDigital Library
- Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2019. TigerGraph: A Native MPP Graph Database. CoRR, Vol. abs/1901.08248 (2019). showeprint[arXiv]1901.08248 http://arxiv.org/abs/1901.08248Google Scholar
- Vin'i cius Vitor dos Santos Dias, Carlos H. C. Teixeira, Dorgival O. Guedes, Wagner Meira Jr., and Srinivasan Parthasarathy. 2019. Fractal: A General-Purpose Graph Pattern Mining System. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1357--1374. https://doi.org/10.1145/3299869.3319875Google ScholarDigital Library
- Wentian Guo, Yuchen Li, Mo Sha, Bingsheng He, Xiaokui Xiao, and Kian-Lee Tan. 2020b. GPU-Accelerated Subgraph Enumeration on Partitioned Graphs. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1067--1082. https://doi.org/10.1145/3318464.3389699Google ScholarDigital Library
- Wentian Guo, Yuchen Li, and Kian-Lee Tan. 2020a. Exploiting Reuse for GPU Subgraph Enumeration. IEEE Transactions on Knowledge and Data Engineering (2020), 1--1. https://doi.org/10.1109/TKDE.2020.3035564Google ScholarCross Ref
- Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1429--1446. https://doi.org/10.1145/3299869.3319880Google ScholarDigital Library
- Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo(_iso ): towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013, Kenneth A. Ross, Divesh Srivastava, and Dimitris Papadias (Eds.). ACM, 337--348. https://doi.org/10.1145/2463676.2465300Google ScholarDigital Library
- Huahai He and Ambuj K. Singh. 2008. Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10--12, 2009, Jason Tsong-Li Wang (Ed.). ACM, 405--418. https://doi.org/10.1145/1376616.1376660Google ScholarDigital Library
- Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. 2020. Peregrine: a pattern-aware graph mining system. In EuroSys '20: Fifteenth EuroSys Conference 2020, Heraklion, Greece, April 27--30, 2020, Angelos Bilas, Kostas Magoutis, Evangelos P. Markatos, Dejan Kostic, and Margo I. Seltzer (Eds.). ACM, 13:1--13:16. https://doi.org/10.1145/3342195.3387548Google ScholarDigital Library
- Alpá r Jü ttner and Pé ter Madarasi. 2018. VF2 - An improved subgraph isomorphism algorithm. Discret. Appl. Math., Vol. 242 (2018), 69--81. https://doi.org/10.1016/j.dam.2018.02.018Google ScholarCross Ref
- Chathura Kankanamge, Siddhartha Sahu, Amine Mhedhbi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An Active Graph Database. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, 1695--1698. https://doi.org/10.1145/3035918.3056445Google ScholarDigital Library
- Hyunjoon Kim, Yunyoung Choi, Kunsoo Park, Xuemin Lin, Seok-Hee Hong, and Wook-Shin Han. 2021. Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 925--937. https://doi.org/10.1145/3448016.3457265Google ScholarDigital Library
- Longbin Lai, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang. 2016. Scalable Distributed Subgraph Enumeration. Proc. VLDB Endow., Vol. 10, 3 (2016), 217--228. https://doi.org/10.14778/3021924.3021937Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.Google Scholar
- Daniel Mawhirter, Sam Reinehr, Connor Holmes, Tongping Liu, and Bo Wu. 2021. GraphZero: A High-Performance Subgraph Matching System. ACM SIGOPS Oper. Syst. Rev., Vol. 55, 1 (2021), 21--37. https://doi.org/10.1145/3469379.3469383Google ScholarDigital Library
- Daniel Mawhirter and Bo Wu. 2019. AutoMine: harmonizing high-level abstraction and high performance for graph mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP 2019, Huntsville, ON, Canada, October 27--30, 2019, Tim Brecht and Carey Williamson (Eds.). ACM, 509--523. https://doi.org/10.1145/3341301.3359633Google ScholarDigital Library
- Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow., Vol. 12, 11 (July 2019), 1692--1704. https://doi.org/10.14778/3342263.3342643Google ScholarDigital Library
- Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2013. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., Vol. 42, 4 (2013), 5--16. https://doi.org/10.1145/2590989.2590991Google ScholarDigital Library
- N. Prvzulj, D. G. Corneil, and I. Jurisica. 2006. Efficient Estimation of Graphlet Frequency Distributions in Protein--Protein Interaction Networks. Bioinformatics, Vol. 22, 8 (April 2006), 974--980. https://doi.org/10.1093/bioinformatics/btl030Google Scholar
- Miao Qiao, Hao Zhang, and Hong Cheng. 2017. Subgraph Matching: On Compression and Computation. Proc. VLDB Endow., Vol. 11, 2 (Oct. 2017), 176--188. https://doi.org/10.14778/3149193.3149198Google ScholarDigital Library
- RedisLabs. 2021. RedisGraph - a graph database module for Redis. https://oss.redislabs.com/redisgraph/Google Scholar
- Xuguang Ren and Junhu Wang. 2015. Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs. Proc. VLDB Endow., Vol. 8, 5 (Jan. 2015), 617--628. https://doi.org/10.14778/2735479.2735493Google ScholarDigital Library
- Carlos R. Rivero and Hasan M. Jamil. 2017. Efficient and scalable labeled subgraph matching using SGMatch. Knowl. Inf. Syst., Vol. 51, 1 (2017), 61--87. https://doi.org/10.1007/s10115-016-0968--2Google ScholarCross Ref
- Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Ö zsu. 2017. The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. Proc. VLDB Endow., Vol. 11, 4 (2017), 420--431. https://doi.org/10.1145/3186728.3164139Google ScholarDigital Library
- Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism. Proc. VLDB Endow., Vol. 1, 1 (Aug. 2008), 364--375. https://doi.org/10.14778/1453856.1453899Google ScholarDigital Library
- Tianhui Shi, Mingshu Zhai, Yi Xu, and Jidong Zhai. 2020. GraphPi: high performance graph pattern matching through effective redundancy elimination. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9--19, 2020, Christine Cuicchi, Irene Qualters, and William T. Kramer (Eds.). IEEE/ACM, 100. https://doi.org/10.1109/SC41405.2020.00104Google ScholarCross Ref
- Tom A. B. Snijders, Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock. 2006. New Specifications for Exponential Random Graph Models. Sociological Methodology, Vol. 36, 1 (2006), 99--153. https://doi.org/10.1111/j.1467--9531.2006.00176.xGoogle ScholarCross Ref
- Shixuan Sun and Qiong Luo. 2020. In-Memory Subgraph Matching: An In-Depth Study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1083--1098. https://doi.org/10.1145/3318464.3380581Google ScholarDigital Library
- The Neo4J Team. 2021. Neo4J. https://neo4j.com/Google Scholar
- Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: a system for distributed graph mining. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4--7, 2015, Ethan L. Miller and Steven Hand (Eds.). ACM, 425--440. https://doi.org/10.1145/2815400.2815410Google ScholarDigital Library
- Vasileios Trigonakis, Jean-Pierre Lozi, Tomá s Falt'i n, Nicholas P. Roth, Iraklis Psaroudakis, Arnaud Delamare, Vlad Haprian, Calin Iorgulescu, Petr Koupy, Jinsoo Lee, Sungpack Hong, and Hassan Chafi. 2021. aDFS: An Almost Depth-First-Search Distributed Graph-Querying System. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14--16, 2021, Irina Calciu and Geoff Kuenning (Eds.). USENIX Association, 209--224. https://www.usenix.org/conference/atc21/presentation/trigonakisGoogle Scholar
- J. R. Ullmann. 1976. An Algorithm for Subgraph Isomorphism. J. ACM, Vol. 23, 1 (Jan. 1976), 31--42. https://doi.org/10.1145/321921.321925Google ScholarDigital Library
- Todd L. Veldhuizen. 2012. Leapfrog Triejoin: a worst-case optimal join algorithm. CoRR, Vol. abs/1210.0481 (2012). arxiv: 1210.0481 http://arxiv.org/abs/1210.0481Google Scholar
- Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, and Guoqing Harry Xu. 2018. RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8--10, 2019, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 763--782. https://www.usenix.org/conference/osdi18/presentation/wangGoogle Scholar
- Wikipedia contributors. 2021. Box plot -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Box_plot&oldid=1059408900 [Online; accessed 11-December-2021].Google Scholar
- Da Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M. Tamer Ö zsu, Wei-Shinn Ku, and John C. S. Lui. 2020. G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20--24, 2020. IEEE, 1369--1380. https://doi.org/10.1109/ICDE48307.2020.00122Google Scholar
- Xifeng Yan, Philip S. Yu, and Jiawei Han. 2004. Graph Indexing: A Frequent Structure-Based Approach. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (Paris, France) (SIGMOD '04). Association for Computing Machinery, New York, NY, USA, 335--346. https://doi.org/10.1145/1007568.1007607Google ScholarDigital Library
- Zhengyi Yang, Longbin Lai, Xuemin Lin, Kongzhang Hao, and Wenjie Zhang. 2021. HUGE: An Efficient and Scalable Subgraph Enumeration System. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2049--2062. https://doi.org/10.1145/3448016.3457237Google ScholarDigital Library
- Peixiang Zhao and Jiawei Han. 2010. On Graph Query Optimization in Large Networks. Proc. VLDB Endow., Vol. 3, 1 (2010), 340--351. https://doi.org/10.14778/1920841.1920887Google ScholarDigital Library
Index Terms
- Circinus: Fast Redundancy-Reduced Subgraph Matching
Recommendations
Fast Continuous Subgraph Matching over Streaming Graphs via Backtracking Reduction
PACMMODStreaming graphs are drawing increasing attention in both academic and industrial communities as many graphs in real applications evolve over time. Continuous subgraph matching (shorted as CSM) aims to report the incremental matches of a query graph in ...
Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataSubgraph matching (or subgraph isomorphism) is one of the fundamental problems in graph analysis. Extensive research has been done to develop practical solutions for subgraph matching. The state-of-the-art algorithms such as \textsfCFL-Match and \...
A subgraph matching algorithm based on subgraph index for knowledge graph
AbstractThe problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem. Recently, subgraph matching has become a popular research topic in the field of knowledge graph analysis, which has a wide range of ...
Comments