Abstract
A persistent Regular Path Query (RPQ) on a streaming graph is to continuously find every pair of vertices that are connected by a path in the graph within a sliding window, such that the edge label sequence of this path matches a given regular expression. The existing RPQ evaluation algorithm in the literature incrementally maintains a set of spanning-tree-like data structures to quickly form query results and to avoid reprocessing edges that are shared by multiple sliding windows. This approach allows parallel processing of the graph edges within a sliding window but requires a blocking expiration phase between sliding windows to remove the old edges. This blocking phase can significantly degrade the query performance, especially when the edges arrive quickly and the sliding windows overlap significantly.
This paper presents a new RPQ evaluation strategy called Multi-Window Parallel (MWP) method leveraging a new data structure called Timestamped Rooted Digraph (TRD). The novel idea is to incrementally maintain TRDs for the quick formulation of query results, like the aforementioned spanning trees, but simultaneously contain needed information for multiple sliding windows. MWP eliminates the forced blocking expiration phase. Only when memory runs low, a quick "dirty garbage collection" (DGC) process is done to remove some unneeded edges and nodes on TRDs, without incurring large costs. Extensive experiments on real graph datasets show that MWP significantly outperforms the existing algorithm in terms of throughput, tail latency, and scalability, and that DGC provides an effective solution for releasing memory with minimum impact.
- Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, et al. 2018. G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data. 1421--1432.Google ScholarDigital Library
- Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoc. 2017. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR) 50, 5 (2017), 1--40.Google ScholarDigital Library
- Darko Anicic, Paul Fodor, Sebastian Rudolph, and Nenad Stojanovic. 2011. EP-SPARQL: a unified language for event processing and stream reasoning. In Proceedings of the 20th international conference on World wide web. 635--644.Google ScholarDigital Library
- Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, Juan L Reutter, Javiel Rojas-Ledesma, and Adrián Soto. 2021. Worst-case optimal graph joins in almost no space. In Proceedings of the 2021 International Conference on Management of Data. 102--114.Google ScholarDigital Library
- Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, and Javiel Rojas-Ledesma. 2022. Time-and space-efficient regular path queries. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3091--3105.Google ScholarCross Ref
- Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and Michael Grossniklaus. 2009. C- SPARQL: SPARQL for continuous querying. In Proceedings of the 18th international conference on World wide web. 1061--1062.Google ScholarDigital Library
- Aaron Bernstein. 2013. Maintaining shortest paths under deletions in weighted directed graphs. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. 725--734.Google ScholarDigital Library
- Angela Bonifati, Wim Martens, and Thomas Timm. 2019. Navigating the Maze of Wikidata Query Logs. the web conference (2019).Google Scholar
- Jean-Paul Calbimonte. 2017. Linked data notifications for rdf streams. In Proceedings of the Web Stream Processing workshop (WSP 2017) and the 2nd International Workshop on Ontology Modularity, Contextuality, and Evolution (WOMoCoE 2017) co-located with 16th International Semantic Web Conference (ISWC 2017). 22 October 2017.Google Scholar
- Jean-Paul Calbimonte, Oscar Corcho, and Alasdair JG Gray. 2010. Enabling ontology-based access to streaming data sources. In The Semantic Web--ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7--11, 2010, Revised Selected Papers, Part I 9. Springer, 96--111.Google Scholar
- Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems. 85--98.Google ScholarDigital Library
- Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining stream statistics over sliding windows. SIAM journal on computing 31, 6 (2002), 1794--1813.Google Scholar
- Daniele Dell'Aglio, Jean-Paul Calbimonte, Emanuele Della Valle, and Oscar Corcho. 2015. Towards a unified language for RDF stream query processing. In European Semantic Web Conference. Springer, 353--363.Google ScholarDigital Library
- Saumen Dey, Víctor Cuevas-Vicenttín, Sven Köhler, Eric Gribkoff, Michael Wang, and Bertram Ludäscher. 2013. On implementing provenance-aware regular path queries with relational query engines. In Proceedings of the Joint EDBT/ICDT 2013 Workshops. 214--223.Google ScholarDigital Library
- Andrzej Ehrenfeucht and Paul Zeiger. 1974. Complexity measures for regular expressions. In Proceedings of the sixth annual ACM symposium on Theory of computing. 75--79.Google ScholarDigital Library
- Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A system for recommending 3 billion items to 200 million users in real-time. In Proceedings of the 2018 world wide web conference. 1775--1784.Google ScholarDigital Library
- Orri Erling and Ivan Mikhailov. 2009. RDF Support in the Virtuoso DBMS. In Networked Knowledge-Networked Media: Integrating Knowledge Management, New Media Technologies and Semantic Systems. Springer, 7--24.Google Scholar
- Valeria Fionda, Giuseppe Pirrò, and Mariano P Consens. 2019. Querying knowledge graphs with extended property paths. Semantic Web 10, 6 (2019), 1127--1168.Google ScholarDigital Library
- Victor Mikhaylovich Glushkov. 1961. The abstract theory of automata. Russian Mathematical Surveys 16, 5 (1961), 1.Google ScholarCross Ref
- Xiangyang Gou and Lei Zou. 2021. Sliding window-based approximate triangle counting over streaming graphs with duplicate edges. In Proceedings of the 2021 International Conference on Management of Data. 645--657.Google ScholarDigital Library
- Ajeet Grewal, Jerry Jiang, Gary Lam, Tristan Jung, Lohith Vuddemarri, Quannan Li, Aaditya Landge, and Jimmy Lin. 2018. RecService: Distributed Real-Time Graph Processing at Twitter. In 10th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 18). USENIX Association, Boston, MA. https://www.usenix.org/conference/hotcloud18/presentation/grewalGoogle Scholar
- Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: a graph engine for temporal graph analysis. In Proceedings of the Ninth European Conference on Computer Systems. 1--14.Google ScholarDigital Library
- John Hopcroft. 1971. An n log n algorithm for minimizing states in a finite automaton. In Theory of machines and computations. Elsevier, 189--196.Google Scholar
- Louis Jachiet, Pierre Genevès, Nils Gesbert, and Nabil Layaïda. 2020. On the optimization of recursive relational queries: Application to graph queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 681--697.Google ScholarDigital Library
- Bruce M Kapron, Valerie King, and Ben Mountjoy. 2013. Dynamic graph connectivity in polylogarithmic worst case time. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 1131--1142.Google ScholarCross Ref
- Srdjan Komazec, Davide Cerri, and Dieter Fensel. 2012. Sparkwave: continuous schema-enhanced pattern matching over RDF data streams. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems. 58--68.Google ScholarDigital Library
- André Koschmieder and Ulf Leser. 2012. Regular path queries on large graphs. In Scientific and Statistical Database Management: 24th International Conference, SSDBM 2012, Chania, Crete, Greece, June 25--27, 2012. Proceedings 24. Springer, 177--194.Google ScholarDigital Library
- Jakub Lacki. 2011. Improved deterministic algorithms for decremental transitive closure and strongly connected components. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 1438--1445.Google Scholar
- Danh Le-Phuoc, Minh Dao-Tran, Josiane Xavier Parreira, and Manfred Hauswirth. 2011. A native and adaptive approach for unified processing of linked streams and linked data. In International Semantic Web Conference. Springer, 370--388.Google ScholarCross Ref
- David Lomet, Alan Fekete, Rui Wang, and Peter Ward. 2012. Multi-version concurrency via timestamp range conflict management. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 714--725.Google ScholarDigital Library
- Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. 2014. Yago3: A knowledge base from multilingual wikipedias. In 7th biennial conference on innovative data systems research. CIDR Conference.Google Scholar
- Kento Miura, Toshiyuki Amagasa, Hiroyuki Kitagawa, R Bordawekar, and T Lahiri. 2019. Accelerating Regular Path Queries using FPGA.. In ADMS@ VLDB. 47--54.Google Scholar
- Jayanta Mondal and Amol Deshpande. 2014. Eagr: Supporting continuous ego-centric aggregate queries over large dynamic graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of data. 1335--1346.Google ScholarDigital Library
- Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 439--455.Google ScholarDigital Library
- Van-Quyet Nguyen and Kyungbaek Kim. 2017. Efficient regular path query evaluation by splitting with unit-subquery cost matrix. IEICE TRANSACTIONS on Information and Systems 100, 10 (2017), 2648--2652.Google ScholarCross Ref
- Maurizio Nolé and Carlo Sartiani. 2016. Regular path queries on massive graphs. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management. 1--12.Google ScholarDigital Library
- Nigini Oliveira, Michael Muller, Nazareno Andrade, and Katharina Reinecke. 2018. The exchange in StackExchange: Divergences between Stack Overflow and its culturally diverse participants. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--22.Google ScholarDigital Library
- Anil Pacaci, Angela Bonifati, and M Tamer Özsu. 2020. Regular path query evaluation on streaming graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1415--1430.Google ScholarDigital Library
- Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time constrained cycle detection in large dynamic graphs. Proceedings of the VLDB Endowment 11, 12 (2018), 1876--1888.Google ScholarDigital Library
- Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time content recommendations at Twitter. Proceedings of the VLDB Endowment 9, 13 (2016), 1281--1292.Google ScholarDigital Library
- Frank Tetzel, Wolfgang Lehner, and Romans Kasperovics. 2020. Efficient Compilation of Regular Path Queries. Datenbank-Spektrum 20 (2020), 243--259.Google ScholarCross Ref
- Ken Thompson. 1968. Programming techniques: Regular expression search algorithm. Commun. ACM 11, 6 (1968), 419--422.Google ScholarDigital Library
- Sarisht Wadhwa, Anagh Prasad, Sayan Ranu, Amitabha Bagchi, and Srikanta Bedathur. 2019. Efficiently answering regular simple path queries on large labeled networks. In Proceedings of the 2019 international conference on management of data. 1463--1480.Google ScholarDigital Library
- Xin Wang, Junhu Wang, and Xiaowang Zhang. 2016. Efficient distributed regular path queries on rdf graphs using partial evaluation. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 1933--1936.Google ScholarDigital Library
- Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2013. Evaluation of SPARQL Property Paths via Recursive SQL. AMW 1087 (2013).Google Scholar
- Ying Zhang, Pham Minh Duc, Oscar Corcho, and Jean-Paul Calbimonte. 2012. SRBench: a streaming RDF/SPARQL benchmark. In The Semantic Web--ISWC 2012: 11th International Semantic Web Conference, Boston, MA, USA, November 11--15, 2012, Proceedings, Part I 11. Springer, 641--657.Google Scholar
Index Terms
- MWP: Multi-Window Parallel Evaluation of Regular Path Queries on Streaming Graphs
Recommendations
Regular Path Query Evaluation on Streaming Graphs
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataWe study persistent query evaluation over streaming graphs, which is becoming increasingly important. We focus on navigational queries that determine if there exists a path between two entities that satisfies a user-specified constraint. We adopt the ...
Estimating the Evaluation Cost of Regular Path Queries on Large Graphs
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyRegular path queries (RPQs) are widely used on a graph whose answer is a set of tuples of nodes connected by paths corresponding to a given regular expression. Traditional approaches for evaluating RPQs are restricted in the explosion of graph size and/...
Answering Regular Path Queries Using Views
ICDE '00: Proceedings of the 16th International Conference on Data EngineeringQuery answering using views amounts to computing the answer to a query having information only on the extension of a set of views. This problem is relevant in several fields, such as information integration, data warehousing, query optimization, mobile ...
Comments