skip to main content
research-article

MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming Graphs

Published:26 March 2024Publication History
Skip Abstract Section

Abstract

A persistent Regular Path Query (RPQ) on a streaming graph is to continuously find every pair of vertices that are connected by a path in the graph within a sliding window, such that the edge label sequence of this path matches a given regular expression. The existing RPQ evaluation algorithm in the literature incrementally maintains a set of spanning-tree-like data structures to quickly form query results and to avoid reprocessing edges that are shared by multiple sliding windows. This approach allows parallel processing of the graph edges within a sliding window but requires a blocking expiration phase between sliding windows to remove the old edges. This blocking phase can significantly degrade the query performance, especially when the edges arrive quickly and the sliding windows overlap significantly.

This paper presents a new RPQ evaluation strategy called Multi-Window Parallel (MWP) method leveraging a new data structure called Timestamped Rooted Digraph (TRD). The novel idea is to incrementally maintain TRDs for the quick formulation of query results, like the aforementioned spanning trees, but simultaneously contain needed information for multiple sliding windows. MWP eliminates the forced blocking expiration phase. Only when memory runs low, a quick "dirty garbage collection" (DGC) process is done to remove some unneeded edges and nodes on TRDs, without incurring large costs. Extensive experiments on real graph datasets show that MWP significantly outperforms the existing algorithm in terms of throughput, tail latency, and scalability, and that DGC provides an effective solution for releasing memory with minimum impact.

References

  1. Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, et al. 2018. G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data. 1421--1432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoc. 2017. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR) 50, 5 (2017), 1--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Darko Anicic, Paul Fodor, Sebastian Rudolph, and Nenad Stojanovic. 2011. EP-SPARQL: a unified language for event processing and stream reasoning. In Proceedings of the 20th international conference on World wide web. 635--644.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, Juan L Reutter, Javiel Rojas-Ledesma, and Adrián Soto. 2021. Worst-case optimal graph joins in almost no space. In Proceedings of the 2021 International Conference on Management of Data. 102--114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, and Javiel Rojas-Ledesma. 2022. Time-and space-efficient regular path queries. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3091--3105.Google ScholarGoogle ScholarCross RefCross Ref
  6. Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and Michael Grossniklaus. 2009. C- SPARQL: SPARQL for continuous querying. In Proceedings of the 18th international conference on World wide web. 1061--1062.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aaron Bernstein. 2013. Maintaining shortest paths under deletions in weighted directed graphs. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. 725--734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Angela Bonifati, Wim Martens, and Thomas Timm. 2019. Navigating the Maze of Wikidata Query Logs. the web conference (2019).Google ScholarGoogle Scholar
  9. Jean-Paul Calbimonte. 2017. Linked data notifications for rdf streams. In Proceedings of the Web Stream Processing workshop (WSP 2017) and the 2nd International Workshop on Ontology Modularity, Contextuality, and Evolution (WOMoCoE 2017) co-located with 16th International Semantic Web Conference (ISWC 2017). 22 October 2017.Google ScholarGoogle Scholar
  10. Jean-Paul Calbimonte, Oscar Corcho, and Alasdair JG Gray. 2010. Enabling ontology-based access to streaming data sources. In The Semantic Web--ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7--11, 2010, Revised Selected Papers, Part I 9. Springer, 96--111.Google ScholarGoogle Scholar
  11. Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems. 85--98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining stream statistics over sliding windows. SIAM journal on computing 31, 6 (2002), 1794--1813.Google ScholarGoogle Scholar
  13. Daniele Dell'Aglio, Jean-Paul Calbimonte, Emanuele Della Valle, and Oscar Corcho. 2015. Towards a unified language for RDF stream query processing. In European Semantic Web Conference. Springer, 353--363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Saumen Dey, Víctor Cuevas-Vicenttín, Sven Köhler, Eric Gribkoff, Michael Wang, and Bertram Ludäscher. 2013. On implementing provenance-aware regular path queries with relational query engines. In Proceedings of the Joint EDBT/ICDT 2013 Workshops. 214--223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Andrzej Ehrenfeucht and Paul Zeiger. 1974. Complexity measures for regular expressions. In Proceedings of the sixth annual ACM symposium on Theory of computing. 75--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A system for recommending 3 billion items to 200 million users in real-time. In Proceedings of the 2018 world wide web conference. 1775--1784.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Orri Erling and Ivan Mikhailov. 2009. RDF Support in the Virtuoso DBMS. In Networked Knowledge-Networked Media: Integrating Knowledge Management, New Media Technologies and Semantic Systems. Springer, 7--24.Google ScholarGoogle Scholar
  18. Valeria Fionda, Giuseppe Pirrò, and Mariano P Consens. 2019. Querying knowledge graphs with extended property paths. Semantic Web 10, 6 (2019), 1127--1168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Victor Mikhaylovich Glushkov. 1961. The abstract theory of automata. Russian Mathematical Surveys 16, 5 (1961), 1.Google ScholarGoogle ScholarCross RefCross Ref
  20. Xiangyang Gou and Lei Zou. 2021. Sliding window-based approximate triangle counting over streaming graphs with duplicate edges. In Proceedings of the 2021 International Conference on Management of Data. 645--657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ajeet Grewal, Jerry Jiang, Gary Lam, Tristan Jung, Lohith Vuddemarri, Quannan Li, Aaditya Landge, and Jimmy Lin. 2018. RecService: Distributed Real-Time Graph Processing at Twitter. In 10th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 18). USENIX Association, Boston, MA. https://www.usenix.org/conference/hotcloud18/presentation/grewalGoogle ScholarGoogle Scholar
  22. Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proceedings of the Ninth European Conference on Computer Systems. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. John Hopcroft. 1971. An n log n algorithm for minimizing states in a finite automaton. In Theory of machines and computations. Elsevier, 189--196.Google ScholarGoogle Scholar
  24. Louis Jachiet, Pierre Genevès, Nils Gesbert, and Nabil Layaïda. 2020. On the optimization of recursive relational queries: Application to graph queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 681--697.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bruce M Kapron, Valerie King, and Ben Mountjoy. 2013. Dynamic graph connectivity in polylogarithmic worst case time. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 1131--1142.Google ScholarGoogle ScholarCross RefCross Ref
  26. Srdjan Komazec, Davide Cerri, and Dieter Fensel. 2012. Sparkwave: continuous schema-enhanced pattern matching over RDF data streams. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems. 58--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. André Koschmieder and Ulf Leser. 2012. Regular path queries on large graphs. In Scientific and Statistical Database Management: 24th International Conference, SSDBM 2012, Chania, Crete, Greece, June 25--27, 2012. Proceedings 24. Springer, 177--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jakub Lacki. 2011. Improved deterministic algorithms for decremental transitive closure and strongly connected components. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 1438--1445.Google ScholarGoogle Scholar
  29. Danh Le-Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, and Manfred Hauswirth. 2011. A native and adaptive approach for unified processing of linked streams and linked data. In International Semantic Web Conference. Springer, 370--388.Google ScholarGoogle ScholarCross RefCross Ref
  30. David Lomet, Alan Fekete, Rui Wang, and Peter Ward. 2012. Multi-version concurrency via timestamp range conflict management. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 714--725.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. 2014. Yago3: A knowledge base from multilingual wikipedias. In 7th biennial conference on innovative data systems research. CIDR Conference.Google ScholarGoogle Scholar
  32. Kento Miura, Toshiyuki Amagasa, Hiroyuki Kitagawa, R Bordawekar, and T Lahiri. 2019. Accelerating Regular Path Queries using FPGA.. In ADMS@ VLDB. 47--54.Google ScholarGoogle Scholar
  33. Jayanta Mondal and Amol Deshpande. 2014. Eagr: Supporting continuous ego-centric aggregate queries over large dynamic graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of data. 1335--1346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 439--455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Van-Quyet Nguyen and Kyungbaek Kim. 2017. Efficient regular path query evaluation by splitting with unit-subquery cost matrix. IEICE TRANSACTIONS on Information and Systems 100, 10 (2017), 2648--2652.Google ScholarGoogle ScholarCross RefCross Ref
  36. Maurizio Nolé and Carlo Sartiani. 2016. Regular path queries on massive graphs. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nigini Oliveira, Michael Muller, Nazareno Andrade, and Katharina Reinecke. 2018. The exchange in StackExchange: Divergences between Stack Overflow and its culturally diverse participants. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Anil Pacaci, Angela Bonifati, and M Tamer Özsu. 2020. Regular path query evaluation on streaming graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1415--1430.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proceedings of the VLDB Endowment 11, 12 (2018), 1876--1888.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time content recommendations at Twitter. Proceedings of the VLDB Endowment 9, 13 (2016), 1281--1292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Frank Tetzel, Wolfgang Lehner, and Romans Kasperovics. 2020. Efficient Compilation of Regular Path Queries. Datenbank-Spektrum 20 (2020), 243--259.Google ScholarGoogle ScholarCross RefCross Ref
  42. Ken Thompson. 1968. Programming techniques: Regular expression search algorithm. Commun. ACM 11, 6 (1968), 419--422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, and Srikanta Bedathur. 2019. Efficiently answering regular simple path queries on large labeled networks. In Proceedings of the 2019 international conference on management of data. 1463--1480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xin Wang, Junhu Wang, and Xiaowang Zhang. 2016. Efficient distributed regular path queries on rdf graphs using partial evaluation. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1933--1936.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2013. Evaluation of SPARQL Property Paths via Recursive SQL. AMW 1087 (2013).Google ScholarGoogle Scholar
  46. Ying Zhang, Pham Minh Duc, Oscar Corcho, and Jean-Paul Calbimonte. 2012. SRBench: a streaming RDF/SPARQL benchmark. In The Semantic Web--ISWC 2012: 11th International Semantic Web Conference, Boston, MA, USA, November 11--15, 2012, Proceedings, Part I 11. Springer, 641--657.Google ScholarGoogle Scholar

Index Terms

  1. MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 2, Issue 1
      PACMMOD
      February 2024
      1874 pages
      EISSN:2836-6573
      DOI:10.1145/3654807
      Issue’s Table of Contents

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 March 2024
      Published in pacmmod Volume 2, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)47
      • Downloads (Last 6 weeks)34

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader