Abstract
With the growing popularity and application of knowledge-based artificial intelligence, the scale of knowledge graph data is dramatically increasing. As an essential type of query for RDF graphs, Regular Path Queries (RPQs) have attracted increasing research efforts, which explore RDF graphs in a navigational manner. Moreover, path indexes have proven successful for semi-structured data management. However, few techniques can be used effectively in practice for processing RPQ on large-scale knowledge graphs. In this paper, we propose a novel indexing solution named FPIRPQ (Frequent Path Index for Regular Path Queries) by leveraging Frequent Path Mining (FPM). Unlike the existing approaches to RPQs processing, FPIRPQ takes advantage of frequent paths, which are statistically derived from the data to accelerate RPQs. Furthermore, since there is no explicit benchmark targeted for RPQs over RDF graph yet, we design a micro-benchmark including 12 basic queries over synthetic and real-world datasets. The experimental results illustrate that FPIRPQ improves the query efficiency by up to orders of magnitude compared to the state-of-the-art RDF storage engine.











Similar content being viewed by others
Data availability
The queries (Q1 \(\sim\) Q12) designed on LUBM and DBpedia are are available in GitHub (https://github.com/haowq0417/FPIRPQ), and all the other data generated or analyzed during this study are included in this published article.
References
Ernst, P., Meng, C., Siu, A., Weikum, G.: Knowlife: A knowledge graph for health and life sciences. IEEE Computer Society (2014)
Shi, L., Li, S., Yang, X., Qi, J., Pan, G., Zhou, B.: Semantic health knowledge graph: semantic integration of heterogeneous medical knowledge and services. BioMed research international 2017 (2017)
Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S., Sontag, D.: Learning a health knowledge graph from electronic medical records. Scientific Reports 7(1), 1–11 (2017)
Liu, J., Lu, Z., Du, W.: Combining enterprise knowledge graph and news sentiment analysis for stock price prediction. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (2019)
Ulicny, B.: Constructing knowledge graphs with trust. In: 4Th International Workshop on Methods for Establishing Trust of (Open) Data, Bentlehem, USA (2015)
Chen, P., Lu, Y., Zheng, V.W., Chen, X., Yang, B.: Knowedu: a system to construct knowledge graph for education. Ieee Access 6, 31553–31563 (2018)
Grévisse, C., Manrique, R., Mariño, O., Rothkugel, S.: Knowledge graph-based teacher support for learning material authoring. In: Colombian Conference on Computing, pp 177–191. Springer (2018)
Consortium, W.W.W., et al.: Rdf 1.1 concepts and abstract syntax (2014)
Consortium, W.W.W., et al.: Sparql 1.1 query language (2013)
Kostylev, E.V., Reutter, J.L., Romero, M., Vrgoč, D.: Sparql with property paths. In: International Semantic Web Conference, pp 3–18. Springer (2015)
Wang, X., Wang, S., Xin, Y., Yang, Y., Li, J., Wang, X.: Distributed pregel-based provenance-aware regular path query processing on rdf knowledge graphs. World Wide Web, 1–32 (2019)
Liu, B., Wang, X., Liu, P., Li, S., Wang, X.: Pairpq: An efficient path index for regular path queries on knowledge graphs. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, pp 106–120. Springer (2021)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp 721–724. IEEE (2002)
Holder, L.B., Cook, D.J., Djoko, S., et al.: Substucture discovery in the subdue system. In: KDD Workshop, pp. 169–180, Washington, DC, USA (1994)
Ghazizadeh, S., Chawathe, S.S.: Seus: Structure extraction using summaries. In: International Conference on Discovery Science, pp 71–85. Springer (2002)
Goldman, R., Widom, J.: Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases. Technical report, Stanford (1997)
Goldman, R.: Approximate dataguides. workshop on query processing for semistructured data and non-standard data formats. http://www-db.stanford.edu/pub/papers/adg.ps (1999)
Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to automata theory, languages, and computation. Acm Sigact News 32(1), 60–65 (2001)
Milo, T., Suciu, D.: Index structures for path expressions. In: International Conference on Database Theory, pp 277–295. Springer (1999)
Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings 18th International Conference on Data Engineering, pp 129–140. IEEE (2002)
Chen, Q., Lim, A., Ong, K.W.: D (k)-index: An adaptive structural summary for graph-structured data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp 134–144 (2003)
He, H., Yang, J.: Multiresolution indexing of xml for frequent queries. In: Proceedings. 20th International Conference on Data Engineering, pp 683–694. IEEE (2004)
Erling, O., Mikhailov, I.: Rdf support in the virtuoso dbms. In: Networked Knowledge-Networked Media, pp 7–24. Springer (2009)
Das, S., Agrawal, D., El Abbadi, A.: G-store: A scalable data store for transactional multi key access in the cloud. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp 163–174 (2010)
Liu, B., Wang, X., Liu, P., Li, S., Zhang, X., Yang, Y.: Knowledge graph database system with unified model and query languages. Ruan Jian Xue Bao/Journal of Software (in Chinese) 32(3), 781–804 (2021)
Brzozowski, J.A.: Derivatives of regular expressions. Journal of the ACM (JACM) 11(4), 481–494 (1964)
Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.S.: Mining top-k large structural patterns in a massive network. Proceedings of the VLDB Endowment 4(11), 807–818 (2011)
Vanetik, N., Gudes, E., Shimony, S.E.: Computing frequent graph patterns from semistructured data. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings., pp. 458–465. IEEE (2002)
Bonifati, A., Martens, W., Timm, T.: An analytical study of large sparql query logs. VLDB J. 29(2), 655–679 (2020)
Guo, Y., Pan, Z., Heflin, J.: Lubm: a benchmark for owl knowledge base systems. Journal of Web Semantics 3(2-3), 158–182 (2005)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., et al.: Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6(2), 167–195 (2015)
Acknowledgments
This work is expanded on the PAIRPQ: An Efficient Path Index for Regular Path Queries on Knowledge Graphs [12], and is supported by National Key Research and Development Program of China (2019YFE0198600); the National Natural Science Foundation of China (61972275).
Funding
This work is supported by National Key Research and Development Program of China (2019YFE0198600); the National Natural Science Foundation of China (61972275).
Author information
Authors and Affiliations
Contributions
Xin Wang and Wenqi Hao are the major contributors in writing the manuscript and preparing the pictures. Yuzhou Qin and Baozhu Liu participate in the experiments and analyze the results. All authors read and approve the final manuscript.
Corresponding author
Ethics declarations
Human and animal ethics
Not applicable
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: APWeb-WAIM 2021
Guest Editors: Yi Cai, Leong Hou U, Marc Spaniol, Yasushi Sakurai
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Hao, W., Qin, Y. et al. FPIRPQ: Accelerating regular path queries on knowledge graphs. World Wide Web 26, 661–681 (2023). https://doi.org/10.1007/s11280-022-01103-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-022-01103-5