Skip to main content
Log in

Distributed processing of regular path queries in RDF graphs

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

SPARQL 1.1 offers a type of navigational query for RDF systems, called regular path query (RPQ). A regular path query allows for retrieving node pairs with the paths between them satisfying regular expressions. Regular path queries are always difficult to be evaluated efficiently because of the possible large search space. Thus there has been no scalable and practical solution so far. In this paper, we present Leon+, an in-memory distributed framework, to address the RPQ problem in the context of the knowledge graph. To reduce search space and mitigate mounting communication costs, Leon+ takes advantage of join-ahead pruning via a novel RDF summarization technique together with a path partitioning strategy. We also develop a subtle cost model to devise query plans to achieve high efficiency for complex RPQs. As there has been no available RPQ benchmark, we create micro-benchmarks on both synthetic and real-world datasets. A thorough experimental evaluation is presented between our approach and the state-of-the-art RDF stores. The results show that our approach outperforms 5x faster than the competitors on single RPQ. For query workload, it saves up to 1/2 time and 2/3 communication overheads over the baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Apache jena. http://jena.apache.org/

  2. Barton. http://dslam.cs.umd.edu/data/barton/

  3. Dblp. https://dblp.uni-trier.de/

  4. Dbpedia. https://wiki.dbpedia.org/

  5. Lubm. http://swat.cse.lehigh.edu/projects/lubm/

  6. Mpich. https://www.mpich.org/

  7. Propery path. http://www.w3.org/TR/sparql11-property-paths/

  8. Rdf. http://www.w3.org/TR/rdf-concepts/

  9. Sparql. http://www.w3.org/TR/rdf-sparql-query/

  10. Uniprot. http://jena.apache.org/0

  11. Watdiv. dsg.uwaterloo.ca/watdiv/

  12. Yago2. http://jena.apache.org/1

  13. Abul-Basher Z, Yakovets N, Godfrey P, Ghajar-Khosravi S, Chignell MH (2017) Tasweet: optimizing disjunctive regular path queries in graph databases. In: EDBT/ICDT 2017 joint conference 20th international conference on extending database technology. https://doi.org/10.5441/002/edbt.2017.47

  14. Al-Harbi R, Abdelaziz I, Kalnis P, Mamoulis N, Ebrahim Y, Sahli M (2016) Accelerating sparql queries by exploiting hash-based locality and adaptive partitioning. VLDB J 25:355–380. http://jena.apache.org/2

    Article  Google Scholar 

  15. Andreev K, Räcke H (2004) Balanced graph partitioning. Theory Comput Syst 39:929–939. https://doi.org/10.1145/1007912.1007931

    Article  MathSciNet  MATH  Google Scholar 

  16. Arias M, Fernández JD, Martínez-Prieto MA, Fuente P (2011) An empirical study of real-world sparql queries. arXiv:abs/1103.5043

  17. Baier J, Daroch D, Reutter JL, Vrgoč D (2017) Evaluating navigational RDF queries over the web. In: Proceedings of the 28th ACM conference on hypertext and social media-HT ’17. ACM Press. https://doi.org/10.1145/3078714.3078731

  18. Bonifati A, Martens W, Timm T (2019) An analytical study of large SPARQL query logs. Springer, Berlin

    Google Scholar 

  19. Dey S, Cuevas-Vicenttín V, Köhler S, Gribkoff E, Wang M, Ludäscher B (2013) On implementing provenance-aware regular path queries with relational query engines. In: Proceedings of the joint EDBT/ICDT 2013 workshops on–EDBT ’13. ACM Press. https://doi.org/10.1145/2457317.2457353

  20. Erling O, Mikhailov I (2009) Virtuoso: RDF support in a native RDBMS. In: Semantic web information management, pp 501–519. Springer, Berlin. https://doi.org/10.1007/978-3-642-04329-1_21

  21. Even G, Naor JS, Rao S, Schieber B (1999) Fast approximate graph partitioning algorithms. Society for Industrial & Applied Mathematics (SIAM), pp. 2187–2214. https://doi.org/10.1137/s0097539796308217

  22. Fan W, Li J, Ma S, Tang N, Wu Y (2011) Adding regular expressions to graph reachability and pattern queries. In: 2011 IEEE 27th International Conference on Data Engineering. IEEE. https://doi.org/10.1109/icde.2011.5767858

  23. Fletcher GHL, Peters J, Poulovassilis A (2016) Efficient regular path query evaluation using path indexes. In: EDBT. https://doi.org/10.5441/002/edbt.2016.67

  24. Garey MR, Johnson DS (1990) Computers and intractability: a guide to the theory of NP-completeness. Freeman & Co., USA, W. H. https://doi.org/10.5555/574848

  25. Gubichev A, Bedathur SJ, Seufert S (2013) Sparqling kleene: fast property paths in rdf-3x. In: First international workshop on graph data management experiences and systems–GRADES ’13. ACM Press. https://doi.org/10.1145/2484425.2484443

  26. Guo X, Gao H, Zou Z (2019) Leon: a distributed RDF engine for multi-query processing. In: Database systems for advanced applications, pp. 742–759. Springer, Berlin. https://doi.org/10.1007/978-3-030-18576-3_44

  27. Gurajada S, Seufert S, Miliaraki I, Theobald M (2014) Triad: a distributed shared-nothing rdf engine based on asynchronous message passing. In: SIGMOD conference. https://doi.org/10.1145/2588555:2610511

  28. Hellmann S, Stadler C, Lehmann J, Auer S (2009) Dbpedia live extraction. In: OTM conferences. https://doi.org/10.1007/978-3-642-05151-7_33

  29. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. pp. 359–392. https://doi.org/10.1137/s1064827595287997

  30. Konstas I, Stathopoulos V, Jose JM (2009) On social networks and collaborative recommendation. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval–SIGIR ’09. ACM Press. https://doi.org/10.1145/1571941.1571977

  31. Koschmieder A, Leser U (2012) Regular path queries on large graphs. In: Lecture notes in computer science, pp 177–194. Springer, Berlin. https://doi.org/10.1007/978-3-642-31235-9_12

  32. Losemann K, Martens W (2012) The complexity of evaluating path expressions in SPARQL. In: Proceedings of the 31st symposium on Principles of Database Systems–PODS ’12. ACM Press. https://doi.org/10.1145/2213556.2213573

  33. Meimaris M, Papastefanatos G, Mamoulis N, Anagnostopoulos I (2017) Extended characteristic sets: graph indexing for SPARQL query optimization. In: 2017 IEEE 33rd international conference on data engineering (ICDE). IEEE. https://doi.org/10.1109/icde.2017.106

  34. Mendelzon AO, Wood PT (1995) Finding regular simple paths in graph databases. Society for Industrial & Applied Mathematics (SIAM), pp 1235–1258 https://doi.org/10.1137/s009753979122370x

  35. Neumann T, Moerkotte G (2011) Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: 2011 IEEE 27th international conference on data engineering ICDE. IEEE. https://doi.org/10.1109/icde.2011.5767868

  36. Neumann T, Weikum G (2009) The rdf-3x engine for scalable management of rdf data. VLDB J 19:91–113. http://jena.apache.org/5

    Article  Google Scholar 

  37. Scott J, Ideker T, Karp RM, Sharan R (2006) Efficient algorithms for detecting signaling pathways in protein interaction networks. J Comput Biol 13(2):133–144. http://jena.apache.org/6

    Article  MathSciNet  MATH  Google Scholar 

  38. Selmer P, Poulovassilis A, Wood PT (2015) Implementing flexible operators for regular path queries. CEUR Workshop Proc 1330:149–156

    Google Scholar 

  39. Seufert S, Anand A, Bedathur S, Weikum G (2013) FERRARI: Flexible and efficient reachability range assignment for graph indexing. In: 2013 IEEE 29th international conference on data engineering (ICDE). IEEE. https://doi.org/10.1109/icde.2013.6544893

  40. Tetzel F, Voigt H, Paradies M, Lehner W (2017) An analysis of the feasibility of graph compression techniques for indexing regular path queries. In: Proceedings of the fifth international workshop on graph data-management experiences & systems–GRADES’17. ACM Press. https://doi.org/10.1145/3078447.3078458

  41. Thompson K (1968) Programming techniques: Regular expression search algorithm. Commun ACM 11(6):419–422. http://jena.apache.org/7

    Article  MATH  Google Scholar 

  42. Valstar LD, Fletcher GH, Yoshida Y (2017) Landmark indexing for evaluation of label-constrained reachability queries. In: Proceedings of the 2017 ACM international conference on management of data–SIGMOD ’17. ACM Press. https://doi.org/10.1145/3035918.3035955

  43. Wadhwa S, Prasad A, Ranu S, Bagchi A, Bedathur S (2019) Efficiently answering regular simple path queries on large labeled networks. In: Proceedings of the 2019 international conference on management of data—SIGMOD ’19. ACM Press. https://doi.org/10.1145/3299869.3319882

  44. Yakovets N, Godfrey P, Gryz J (2013) Evaluation of sparql property paths via recursive sql. AMW 1087

  45. Yakovets N, Godfrey P, Gryz J (2016) Query planning for evaluating SPARQL property paths. In: Proceedings of the 2016 international conference on management of data–SIGMOD ’16. ACM Press. https://doi.org/10.1145/2882903.2882944

  46. Zou L, Xu K, Yu JX, Chen L, Xiao Y, Zhao D (2014) Efficient processing of label-constraint reachability queries in large graphs. Elsevier, Amsterdam, pp. 47–66. https://doi.org/10.1016/j.is.2013.10.003

Download references

Acknowledgements

This work is supported by the Joint Funds of the National Natural Science Foundation of China No.U19A2059, the National Key Research and Development Program of China No.2019YFB2101902, and National Natural Science Foundation of China Nos. 61532015 and 61672189.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of our previous conference paper [26].

Appendices

Appendix

LUBM RPQ

prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>

prefix ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>

Q1: SELECT * WHERE {?x ub:subOrganizationOf+ ?y. }

Q2: SELECT * WHERE {?x ub:worksFor/ub:subOrganizationOf+ ?y. }

Q3: SELECT * WHERE {?x ub:headOf/ub:subOrganizationOf+/ub:name ?y . }

Q4: SELECT * WHERE {?x (ub:headOf|ub:subOrganizationOf|ub:memberOf)+ ?y . }

Q5: SELECT * WHERE { ?x rdf:type ub:ResearchGroup . ?x ub:subOrganizationOf+ ?y. ?y rdf:type ub: University. }

Q6: SELECT * WHERE { ?x rdf:type ub:FullProfessor. ?x ub:headOf ?d. ?d ub: subOrganizationOf+ ?y. ?y rdf:type ub:University. }

Q7: SELECT * WHERE { ?r1 rdf:type ub:ResearchGroup . ?r1 ub:subOrganizationOf+ ?y. ?y rdf:type ub:University . ?r2 rdf:type ub:ResearchGroup. ?r2 ub:subOrganizationOf+ ?y. }

YAGO2 RPQ

base <http://yago-knowledge.org/resource/>

Q1: SELECT * WHERE {?x hasCapital+ ?y . }

Q2: SELECT * WHERE {?x hasAcademicAdvisor+ ?y . }

Q3: SELECT * WHERE {?x created+ ?y . }

Q4: SELECT * WHERE {?x hasChild+ ?y . }

Q5: SELECT * WHERE {?x influences+ ?y . }

Q6: SELECT * WHERE {?x dealsWith+/hasCapital+ ?y . }

Q7: SELECT * WHERE {?x influences+/isMarriedTo+ ?y . }

Q8: SELECT * WHERE {?x isMarriedTo+/hasChild+ ?y . }

Q9: SELECT * WHERE {?x isKnownFor+/isMarriedTo+/hasChild+ ?y . }

Q10: SELECT * WHERE {?x happenedIn+/dealsWith+/hasCapital+ ?y . }

Q11: SELECT * WHERE {?x (isConnectedTo/isLocatedIn/owns)+ ?y . }

Q12: SELECT * WHERE {?x (dealsWith/participatedIn/ isLocatedIn)+ ?y . }

Q13: SELECT * WHERE {?x (isLeaderOf/dealsWith/ participatedIn)+ ?y . }

Q14: SELECT * WHERE {?x (happenedIn|hasCapital |participatedIn)+ ?y . }

Q15: SELECT * WHERE {?x isCitizenOf/dealsWith+/ hasCapital ?y . }

Q16: SELECT * WHERE {?x isLocatedIn+ ?r. ?x (dealsWith/hasCapital/isLocatedIn)+ ?y.}

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, X., Gao, H. & Zou, Z. Distributed processing of regular path queries in RDF graphs. Knowl Inf Syst 63, 993–1027 (2021). https://doi.org/10.1007/s10115-020-01536-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01536-2

Keywords

Navigation