Skip to main content

How is Your Knowledge Graph Used: Content-Centric Analysis of SPARQL Query Logs

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2023 (ISWC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14265))

Included in the following conference series:

  • 1452 Accesses

Abstract

Knowledge graphs (KGs) are used to integrate and persist information useful to organisations, communities, or the general public. It is essential to understand how KGs are used so as to evaluate the strengths and shortcomings of semantic web standards, data modelling choices formalised in ontologies, deployment settings of triple stores etc. One source of information on the usage of the KGs is the query logs, but making sense of hundreds of thousands of log entries is not trivial. Previous works that studied available logs from public SPARQL endpoints mainly focused on the general syntactic properties of the queries disregarding the semantics and their intent. We introduce a novel, content-centric, approach that we call query log summarisation, in which we group the queries that can be derived from some common pattern. The type of patterns considered in this work is query templates, i.e. common blueprints from which multiple queries can be generated by the replacement of parameters with constants. Moreover, we present an algorithm able to summarise a query log as a list of templates whose time and space complexity is linear with respect to the size of the input (number and dimension of queries). We experimented with the algorithm on the query logs of the Linked SPARQL Queries dataset showing promising results.

An extended version of this paper (pre-print) is available at https://doi.org/10.6084/m9.figshare.23751243.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://usewod.org/workshops.html.

  2. 2.

    Using the initial underscore in the variable name to identify parameters matches with existing practice [27], while using “$” visually helps distinguish the parameters from query variables that often start with “?”.

  3. 3.

    For brevity, the queries omit prefix declarations:

    • dbr: <http://dbpedia.org/resource/>

    • rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

    • foaf: <http://xmlns.com/foaf/0.1/>

    • dbo: <https://dbpedia.org/ontology/>

    .

  4. 4.

    It is for example a recommended way to perform query federation [34].

  5. 5.

    https://github.com/RubenVerborgh/SPARQL.js.

  6. 6.

    https://jena.apache.org/documentation/fuseki2.

  7. 7.

    https://github.com/miguel76/sparql-clustering.

  8. 8.

    http://lsq.aksw.org/.

  9. 9.

    In the table, for conciseness, the statistics of the Bio2RDF endpoints are shown only aggregated for the whole project. In Appendix B the extended version of the paper there is a more detailed version of the table showing the statistics endpoint by endpoint..

  10. 10.

    This choice is motivated by the fact that the Bio2RDF endpoints are part of the same project, the collected logs refer roughly to the same period, and there is considerable overlap in the clients querying the endpoints.

  11. 11.

    With the exception of the Bio2RDF endpoints, which are considered as a whole.

  12. 12.

    One counts the triples in which one resource is subject and the other object, the other counts the triples in which they replace each other or have symmetric role.

  13. 13.

    https://doi.org/110.6084/m9.figshare.23751138.

  14. 14.

    http://lsq.aksw.org/.

References

  1. Aljaloud, S., Luczak-Rösch, M., Chown, T., Gibbins, N.: Get all, filter details-on the use of regular expressions in SPARQL queries. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2014) (2014)

    Google Scholar 

  2. Arias, M., Fernandez, J.D., Martinez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. In: Proceedings of Usage Analysis and the Web of Data (USEWOD 2011) (2011)

    Google Scholar 

  3. Asprino, L., Basile, V., Ciancarini, P., Presutti, V.: Empirical analysis of foundational distinctions in linked open data. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), pp. 3962–3969 (2018). https://doi.org/10.24963/ijcai.2018/551

  4. Asprino, L., Beek, W., Ciancarini, P., van Harmelen, F., Presutti, V.: Observing LOD using equivalent set graphs: it is mostly flat and sparsely linked. In: Ghidini, C., et al. (eds.) ISWC 2019, Part I. LNCS, vol. 11778, pp. 57–74. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_4

    Chapter  Google Scholar 

  5. Asprino, L., Carriero, V.A., Presutti, V.: Extraction of common conceptual components from multiple ontologies. In: Proceedings of the International Conference on Knowledge Capture (K-CAP 2021), pp. 185–192 (2021). https://doi.org/10.1145/3460210.3493542

  6. Asprino, L., Presutti, V.: Observing LOD: its knowledge domains and the varying behavior of ontologies across them. IEEE Access 11, 21127–21143 (2023). https://doi.org/10.1109/ACCESS.2023.3250105

    Article  Google Scholar 

  7. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  8. Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)

    Article  Google Scholar 

  9. Bielefeldt, A., Gonsior, J., Krötzsch, M.: Practical linked data access via SPARQL: the case of wikidata. In: Proceedings of the Workshop on Linked Data on the Web co-located with the Web Conference (LDOW@WWW 2018) (2018)

    Google Scholar 

  10. Bonifati, A., Martens, W., Timm, T.: Navigating the maze of wikidata query logs. In: Proceedings of The Web Conference (WWW 2019), pp. 127–138 (2019). https://doi.org/10.1145/3308558.3313472

  11. Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 29(2–3), 655–679 (2020). https://doi.org/10.1007/s00778-019-00558-9

    Article  Google Scholar 

  12. Chekol, M.W., Euzenat, J., Genevès, P., Layaïda, N.: SPARQL query containment under SHI axioms. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2012) (2012)

    Google Scholar 

  13. Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 Concepts and Abstract Syntax. http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/

  14. Deep, S., Gruenheid, A., Koutris, P., Viglas, S., Naughton, J.F.: Comprehensive and efficient workload summarization. Datenbank-Spektrum 22(3), 249–256 (2022). https://doi.org/10.1007/s13222-022-00427-w

    Article  Google Scholar 

  15. Haklay, M., Weber, P.: Openstreetmap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008). https://doi.org/10.1109/MPRV.2008.80

    Article  Google Scholar 

  16. Han, X., Feng, Z., Zhang, X., Wang, X., Rao, G., Jiang, S.: On the statistical analysis of practical SPARQL queries. In: Proceedings of the 19th International Workshop on Web and Databases (2016). https://doi.org/10.1145/2932194.2932196

  17. Harris, S., et al.: SPARQL 1.1 Query Language. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/

  18. Hartig, O.: Provenance information in the web of data. In: Proceedings of the Workshop on Linked Data on the Web (LDOW 2009) (2009). http://ceur-ws.org/Vol-538/ldow2009_paper18.pdf

  19. Hoxha, J., Junghans, M., Agarwal, S.: Enabling semantic analysis of user browsing patterns in the web of data. In: Proceedings of Usage Analysis and the Web of Data (USEWOD 2012) (2012)

    Google Scholar 

  20. Huelss, J., Paulheim, H.: What SPARQL query logs tell and do not tell about semantic relatedness in LOD - or: the unsuccessful attempt to improve the browsing experience of DBPedia by exploiting query logs. In: Proceedings of ESWC 2015, Revised Selected Papers, pp. 297–308 (2015). https://doi.org/10.1007/978-3-319-25639-9_44

  21. Kamra, A., Terzi, E., Bertino, E.: Detecting anomalous access patterns in relational databases. VLDB J. 17(5), 1063–1077 (2008). https://doi.org/10.1007/s00778-007-0051-4

    Article  Google Scholar 

  22. Kul, G., et al.: Summarizing large query logs in Ettu. CoRR (2016). http://arxiv.org/abs/1608.01013

  23. Lebo, T., Sahoo, S., McGuinness, D.: PROV-O: The PROV Ontology. https://www.w3.org/TR/2013/REC-prov-o-20130430/

  24. Luczak-Rösch, M., Bischoff, M.: Statistical analysis of web of data usage. In: Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn 2011) (2011)

    Google Scholar 

  25. Luczak-Rösch, M., Hollink, L., Berendt, B.: Current directions for usage analysis and the web of data: the diverse ecosystem of web of data access mechanisms. In: Proceedings of the 25th International Conference on World Wide Web (WWW 2016), pp. 885–887 (2016). https://doi.org/10.1145/2872518.2891068

  26. Mathew, S., Petropoulos, M., Ngo, H.Q., Upadhyaya, S.J.: A data-centric approach to insider attack detection in database systems. In: Proceedings of the 13th International Symposium on Recent Advances in Intrusion (RAID 2010), pp. 382–401 (2010). https://doi.org/10.1007/978-3-642-15512-3_20

  27. Meroño-Peñuela, A., Hoekstra, R.: grlc makes GitHub taste like linked data APIs. In: Proceedings of ESWC 2016, pp. 342–353 (2016)

    Google Scholar 

  28. Microsoft: Automatic Tuning - Microsoft SQL Server. https://learn.microsoft.com/en-us/sql/relational-databases/automatic-tuning/automatic-tuning?view=sql-server-ver16

  29. Möller, K., Hausenblas, M., Cyganiak, R., Handschuh, S.: Learning from linked open data usage: patterns & metrics. In: Proceedings of the Web Science Conference (2010)

    Google Scholar 

  30. Möller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food - the ESWC and ISWC metadata projects. In: Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, ISWC-ASWC 2007, pp. 802–815 (2007). https://doi.org/10.1007/978-3-540-76298-0_58

  31. Oracle: Automatic Indexing - Oracle SQL Developer Web. https://docs.oracle.com/en/database/oracle/sql-developer-web/19.2.1/sdweb/automatic-indexing-page.html#GUID-8198E146-1D87-4541-8EC0-56ABBF52B438

  32. Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Proceedings of the International Workshop on Semantic Web Information Management (SWIM 2011) (2011). https://doi.org/10.1145/1999299.1999306

  33. Pichler, R., Skritek, S.: Containment and equivalence of well-designed SPARQL. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2014), pp. 39–50 (2014). https://doi.org/10.1145/2594538.2594542

  34. Prud’hommeaux, E., Buil-Aranda, C.: SPARQL 1.1 Federated Query. http://www.w3.org/TR/2013/REC-sparql11-federated-query-20130321/

  35. Raghuveer, A.: Characterizing machine agent behavior through SPARQL query mining. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2012) (2012)

    Google Scholar 

  36. Rietveld, L., Hoekstra, R., et al.: Man vs. machine: differences in SPARQL queries. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2014) (2014)

    Google Scholar 

  37. Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015, Part II. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15

    Chapter  Google Scholar 

  38. Schoenfisch, J., Stuckenschmidt, H.: Analyzing real-world SPARQL queries and ontology-based data access in the context of probabilistic data. Int. J. Approx. Reason. 90, 374–388 (2017). https://doi.org/10.1016/j.ijar.2017.08.005

    Article  MathSciNet  MATH  Google Scholar 

  39. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

    Article  MathSciNet  MATH  Google Scholar 

  40. Stadler, C., Lehmann, J., Höffner, K., Auer, S.: LinkedGeoData: a core for a web of spatial open data. Semant. Web 3(4), 333–354 (2012). https://doi.org/10.3233/SW-2011-0052

    Article  Google Scholar 

  41. Stadler, C., et al.: LSQ 2.0: a linked dataset of SPARQL query logs (Preprint) (2022). https://aidanhogan.com/docs/lsq-sparql-logs.pdf

  42. Vrandečić, D.: WikiData: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web (WWW 2012), pp. 1063–1064 (2012). https://doi.org/10.1145/2187980.2188242

  43. Wang, J., et al.: Real-time workload pattern analysis for large-scale cloud databases. arXiv e-prints arXiv:2307.02626, July 2023. https://doi.org/10.48550/arXiv.2307.02626

  44. Xie, T., Chandola, V., Kennedy, O.: Query log compression for workload analytics. VLDB Endow. 12(3), 183–196 (2018). https://doi.org/10.14778/3291264.3291265

Download references

Acknowledgements

This work was partially supported by the PNRR project “Fostering Open Science in Social Science Research (FOSSR)” (CUP B83C22003950001) and by the PNRR MUR project PE0000013-FAIR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel Ceriani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Asprino, L., Ceriani, M. (2023). How is Your Knowledge Graph Used: Content-Centric Analysis of SPARQL Query Logs. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47240-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47239-8

  • Online ISBN: 978-3-031-47240-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics