Skip to main content
Log in

FPIRPQ: Accelerating regular path queries on knowledge graphs

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the growing popularity and application of knowledge-based artificial intelligence, the scale of knowledge graph data is dramatically increasing. As an essential type of query for RDF graphs, Regular Path Queries (RPQs) have attracted increasing research efforts, which explore RDF graphs in a navigational manner. Moreover, path indexes have proven successful for semi-structured data management. However, few techniques can be used effectively in practice for processing RPQ on large-scale knowledge graphs. In this paper, we propose a novel indexing solution named FPIRPQ (Frequent Path Index for Regular Path Queries) by leveraging Frequent Path Mining (FPM). Unlike the existing approaches to RPQs processing, FPIRPQ takes advantage of frequent paths, which are statistically derived from the data to accelerate RPQs. Furthermore, since there is no explicit benchmark targeted for RPQs over RDF graph yet, we design a micro-benchmark including 12 basic queries over synthetic and real-world datasets. The experimental results illustrate that FPIRPQ improves the query efficiency by up to orders of magnitude compared to the state-of-the-art RDF storage engine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Algorithm 2
Function
Function
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The queries (Q1 \(\sim\) Q12) designed on LUBM and DBpedia are are available in GitHub (https://github.com/haowq0417/FPIRPQ), and all the other data generated or analyzed during this study are included in this published article.

Notes

  1. https://github.com/haowq0417/FPIRPQ

References

  1. Ernst, P., Meng, C., Siu, A., Weikum, G.: Knowlife: A knowledge graph for health and life sciences. IEEE Computer Society (2014)

  2. Shi, L., Li, S., Yang, X., Qi, J., Pan, G., Zhou, B.: Semantic health knowledge graph: semantic integration of heterogeneous medical knowledge and services. BioMed research international 2017 (2017)

  3. Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S., Sontag, D.: Learning a health knowledge graph from electronic medical records. Scientific Reports 7(1), 1–11 (2017)

    Article  Google Scholar 

  4. Liu, J., Lu, Z., Du, W.: Combining enterprise knowledge graph and news sentiment analysis for stock price prediction. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (2019)

  5. Ulicny, B.: Constructing knowledge graphs with trust. In: 4Th International Workshop on Methods for Establishing Trust of (Open) Data, Bentlehem, USA (2015)

  6. Chen, P., Lu, Y., Zheng, V.W., Chen, X., Yang, B.: Knowedu: a system to construct knowledge graph for education. Ieee Access 6, 31553–31563 (2018)

    Article  Google Scholar 

  7. Grévisse, C., Manrique, R., Mariño, O., Rothkugel, S.: Knowledge graph-based teacher support for learning material authoring. In: Colombian Conference on Computing, pp 177–191. Springer (2018)

  8. Consortium, W.W.W., et al.: Rdf 1.1 concepts and abstract syntax (2014)

  9. Consortium, W.W.W., et al.: Sparql 1.1 query language (2013)

  10. Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoč, D.: Sparql with property paths. In: International Semantic Web Conference, pp 3–18. Springer (2015)

  11. Wang, X., Wang, S., Xin, Y., Yang, Y., Li, J., Wang, X.: Distributed pregel-based provenance-aware regular path query processing on rdf knowledge graphs. World Wide Web, 1–32 (2019)

  12. Liu, B., Wang, X., Liu, P., Li, S., Wang, X.: Pairpq: An efficient path index for regular path queries on knowledge graphs. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, pp 106–120. Springer (2021)

  13. Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp 721–724. IEEE (2002)

  14. Holder, L.B., Cook, D.J., Djoko, S., et al.: Substucture discovery in the subdue system. In: KDD Workshop, pp. 169–180, Washington, DC, USA (1994)

  15. Ghazizadeh, S., Chawathe, S.S.: Seus: Structure extraction using summaries. In: International Conference on Discovery Science, pp 71–85. Springer (2002)

  16. Goldman, R., Widom, J.: Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases. Technical report, Stanford (1997)

  17. Goldman, R.: Approximate dataguides. workshop on query processing for semistructured data and non-standard data formats. http://www-db.stanford.edu/pub/papers/adg.ps (1999)

  18. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to automata theory, languages, and computation. Acm Sigact News 32(1), 60–65 (2001)

    Article  MATH  Google Scholar 

  19. Milo, T., Suciu, D.: Index structures for path expressions. In: International Conference on Database Theory, pp 277–295. Springer (1999)

  20. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings 18th International Conference on Data Engineering, pp 129–140. IEEE (2002)

  21. Chen, Q., Lim, A., Ong, K.W.: D (k)-index: An adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp 134–144 (2003)

  22. He, H., Yang, J.: Multiresolution indexing of xml for frequent queries. In: Proceedings. 20th International Conference on Data Engineering, pp 683–694. IEEE (2004)

  23. Erling, O., Mikhailov, I.: Rdf support in the virtuoso dbms. In: Networked Knowledge-Networked Media, pp 7–24. Springer (2009)

  24. Das, S., Agrawal, D., El Abbadi, A.: G-store: A scalable data store for transactional multi key access in the cloud. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp 163–174 (2010)

  25. Liu, B., Wang, X., Liu, P., Li, S., Zhang, X., Yang, Y.: Knowledge graph database system with unified model and query languages. Ruan Jian Xue Bao/Journal of Software (in Chinese) 32(3), 781–804 (2021)

  26. Brzozowski, J.A.: Derivatives of regular expressions. Journal of the ACM (JACM) 11(4), 481–494 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  27. Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.S.: Mining top-k large structural patterns in a massive network. Proceedings of the VLDB Endowment 4(11), 807–818 (2011)

    Article  Google Scholar 

  28. Vanetik, N., Gudes, E., Shimony, S.E.: Computing frequent graph patterns from semistructured data. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings., pp. 458–465. IEEE (2002)

  29. Bonifati, A., Martens, W., Timm, T.: An analytical study of large sparql query logs. VLDB J. 29(2), 655–679 (2020)

    Article  Google Scholar 

  30. Guo, Y., Pan, Z., Heflin, J.: Lubm: a benchmark for owl knowledge base systems. Journal of Web Semantics 3(2-3), 158–182 (2005)

    Article  Google Scholar 

  31. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al.: Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2), 167–195 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This work is expanded on the PAIRPQ: An Efficient Path Index for Regular Path Queries on Knowledge Graphs [12], and is supported by National Key Research and Development Program of China (2019YFE0198600); the National Natural Science Foundation of China (61972275).

Funding

This work is supported by National Key Research and Development Program of China (2019YFE0198600); the National Natural Science Foundation of China (61972275).

Author information

Authors and Affiliations

Authors

Contributions

Xin Wang and Wenqi Hao are the major contributors in writing the manuscript and preparing the pictures. Yuzhou Qin and Baozhu Liu participate in the experiments and analyze the results. All authors read and approve the final manuscript.

Corresponding author

Correspondence to Xiaofei Wang.

Ethics declarations

Human and animal ethics

Not applicable

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: APWeb-WAIM 2021

Guest Editors: Yi Cai, Leong Hou U, Marc Spaniol, Yasushi Sakurai

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Hao, W., Qin, Y. et al. FPIRPQ: Accelerating regular path queries on knowledge graphs. World Wide Web 26, 661–681 (2023). https://doi.org/10.1007/s11280-022-01103-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01103-5

Keywords