Abstract
Knowledge graphs (KGs) are used to integrate and persist information useful to organisations, communities, or the general public. It is essential to understand how KGs are used so as to evaluate the strengths and shortcomings of semantic web standards, data modelling choices formalised in ontologies, deployment settings of triple stores etc. One source of information on the usage of the KGs is the query logs, but making sense of hundreds of thousands of log entries is not trivial. Previous works that studied available logs from public SPARQL endpoints mainly focused on the general syntactic properties of the queries disregarding the semantics and their intent. We introduce a novel, content-centric, approach that we call query log summarisation, in which we group the queries that can be derived from some common pattern. The type of patterns considered in this work is query templates, i.e. common blueprints from which multiple queries can be generated by the replacement of parameters with constants. Moreover, we present an algorithm able to summarise a query log as a list of templates whose time and space complexity is linear with respect to the size of the input (number and dimension of queries). We experimented with the algorithm on the query logs of the Linked SPARQL Queries dataset showing promising results.
An extended version of this paper (pre-print) is available at https://doi.org/10.6084/m9.figshare.23751243.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Using the initial underscore in the variable name to identify parameters matches with existing practice [27], while using “$” visually helps distinguish the parameters from query variables that often start with “?”.
- 3.
For brevity, the queries omit prefix declarations:
-
dbr: <http://dbpedia.org/resource/>
-
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
-
foaf: <http://xmlns.com/foaf/0.1/>
-
dbo: <https://dbpedia.org/ontology/>
.
-
- 4.
It is for example a recommended way to perform query federation [34].
- 5.
- 6.
- 7.
- 8.
- 9.
In the table, for conciseness, the statistics of the Bio2RDF endpoints are shown only aggregated for the whole project. In Appendix B the extended version of the paper there is a more detailed version of the table showing the statistics endpoint by endpoint..
- 10.
This choice is motivated by the fact that the Bio2RDF endpoints are part of the same project, the collected logs refer roughly to the same period, and there is considerable overlap in the clients querying the endpoints.
- 11.
With the exception of the Bio2RDF endpoints, which are considered as a whole.
- 12.
One counts the triples in which one resource is subject and the other object, the other counts the triples in which they replace each other or have symmetric role.
- 13.
- 14.
References
Aljaloud, S., Luczak-Rösch, M., Chown, T., Gibbins, N.: Get all, filter details-on the use of regular expressions in SPARQL queries. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2014) (2014)
Arias, M., Fernandez, J.D., Martinez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. In: Proceedings of Usage Analysis and the Web of Data (USEWOD 2011) (2011)
Asprino, L., Basile, V., Ciancarini, P., Presutti, V.: Empirical analysis of foundational distinctions in linked open data. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), pp. 3962–3969 (2018). https://doi.org/10.24963/ijcai.2018/551
Asprino, L., Beek, W., Ciancarini, P., van Harmelen, F., Presutti, V.: Observing LOD using equivalent set graphs: it is mostly flat and sparsely linked. In: Ghidini, C., et al. (eds.) ISWC 2019, Part I. LNCS, vol. 11778, pp. 57–74. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_4
Asprino, L., Carriero, V.A., Presutti, V.: Extraction of common conceptual components from multiple ontologies. In: Proceedings of the International Conference on Knowledge Capture (K-CAP 2021), pp. 185–192 (2021). https://doi.org/10.1145/3460210.3493542
Asprino, L., Presutti, V.: Observing LOD: its knowledge domains and the varying behavior of ontologies across them. IEEE Access 11, 21127–21143 (2023). https://doi.org/10.1109/ACCESS.2023.3250105
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)
Bielefeldt, A., Gonsior, J., Krötzsch, M.: Practical linked data access via SPARQL: the case of wikidata. In: Proceedings of the Workshop on Linked Data on the Web co-located with the Web Conference (LDOW@WWW 2018) (2018)
Bonifati, A., Martens, W., Timm, T.: Navigating the maze of wikidata query logs. In: Proceedings of The Web Conference (WWW 2019), pp. 127–138 (2019). https://doi.org/10.1145/3308558.3313472
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 29(2–3), 655–679 (2020). https://doi.org/10.1007/s00778-019-00558-9
Chekol, M.W., Euzenat, J., Genevès, P., Layaïda, N.: SPARQL query containment under SHI axioms. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2012) (2012)
Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 Concepts and Abstract Syntax. http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
Deep, S., Gruenheid, A., Koutris, P., Viglas, S., Naughton, J.F.: Comprehensive and efficient workload summarization. Datenbank-Spektrum 22(3), 249–256 (2022). https://doi.org/10.1007/s13222-022-00427-w
Haklay, M., Weber, P.: Openstreetmap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008). https://doi.org/10.1109/MPRV.2008.80
Han, X., Feng, Z., Zhang, X., Wang, X., Rao, G., Jiang, S.: On the statistical analysis of practical SPARQL queries. In: Proceedings of the 19th International Workshop on Web and Databases (2016). https://doi.org/10.1145/2932194.2932196
Harris, S., et al.: SPARQL 1.1 Query Language. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/
Hartig, O.: Provenance information in the web of data. In: Proceedings of the Workshop on Linked Data on the Web (LDOW 2009) (2009). http://ceur-ws.org/Vol-538/ldow2009_paper18.pdf
Hoxha, J., Junghans, M., Agarwal, S.: Enabling semantic analysis of user browsing patterns in the web of data. In: Proceedings of Usage Analysis and the Web of Data (USEWOD 2012) (2012)
Huelss, J., Paulheim, H.: What SPARQL query logs tell and do not tell about semantic relatedness in LOD - or: the unsuccessful attempt to improve the browsing experience of DBPedia by exploiting query logs. In: Proceedings of ESWC 2015, Revised Selected Papers, pp. 297–308 (2015). https://doi.org/10.1007/978-3-319-25639-9_44
Kamra, A., Terzi, E., Bertino, E.: Detecting anomalous access patterns in relational databases. VLDB J. 17(5), 1063–1077 (2008). https://doi.org/10.1007/s00778-007-0051-4
Kul, G., et al.: Summarizing large query logs in Ettu. CoRR (2016). http://arxiv.org/abs/1608.01013
Lebo, T., Sahoo, S., McGuinness, D.: PROV-O: The PROV Ontology. https://www.w3.org/TR/2013/REC-prov-o-20130430/
Luczak-Rösch, M., Bischoff, M.: Statistical analysis of web of data usage. In: Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn 2011) (2011)
Luczak-Rösch, M., Hollink, L., Berendt, B.: Current directions for usage analysis and the web of data: the diverse ecosystem of web of data access mechanisms. In: Proceedings of the 25th International Conference on World Wide Web (WWW 2016), pp. 885–887 (2016). https://doi.org/10.1145/2872518.2891068
Mathew, S., Petropoulos, M., Ngo, H.Q., Upadhyaya, S.J.: A data-centric approach to insider attack detection in database systems. In: Proceedings of the 13th International Symposium on Recent Advances in Intrusion (RAID 2010), pp. 382–401 (2010). https://doi.org/10.1007/978-3-642-15512-3_20
Meroño-Peñuela, A., Hoekstra, R.: grlc makes GitHub taste like linked data APIs. In: Proceedings of ESWC 2016, pp. 342–353 (2016)
Microsoft: Automatic Tuning - Microsoft SQL Server. https://learn.microsoft.com/en-us/sql/relational-databases/automatic-tuning/automatic-tuning?view=sql-server-ver16
Möller, K., Hausenblas, M., Cyganiak, R., Handschuh, S.: Learning from linked open data usage: patterns & metrics. In: Proceedings of the Web Science Conference (2010)
Möller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food - the ESWC and ISWC metadata projects. In: Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, ISWC-ASWC 2007, pp. 802–815 (2007). https://doi.org/10.1007/978-3-540-76298-0_58
Oracle: Automatic Indexing - Oracle SQL Developer Web. https://docs.oracle.com/en/database/oracle/sql-developer-web/19.2.1/sdweb/automatic-indexing-page.html#GUID-8198E146-1D87-4541-8EC0-56ABBF52B438
Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Proceedings of the International Workshop on Semantic Web Information Management (SWIM 2011) (2011). https://doi.org/10.1145/1999299.1999306
Pichler, R., Skritek, S.: Containment and equivalence of well-designed SPARQL. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2014), pp. 39–50 (2014). https://doi.org/10.1145/2594538.2594542
Prud’hommeaux, E., Buil-Aranda, C.: SPARQL 1.1 Federated Query. http://www.w3.org/TR/2013/REC-sparql11-federated-query-20130321/
Raghuveer, A.: Characterizing machine agent behavior through SPARQL query mining. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2012) (2012)
Rietveld, L., Hoekstra, R., et al.: Man vs. machine: differences in SPARQL queries. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2014) (2014)
Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015, Part II. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15
Schoenfisch, J., Stuckenschmidt, H.: Analyzing real-world SPARQL queries and ontology-based data access in the context of probabilistic data. Int. J. Approx. Reason. 90, 374–388 (2017). https://doi.org/10.1016/j.ijar.2017.08.005
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Stadler, C., Lehmann, J., Höffner, K., Auer, S.: LinkedGeoData: a core for a web of spatial open data. Semant. Web 3(4), 333–354 (2012). https://doi.org/10.3233/SW-2011-0052
Stadler, C., et al.: LSQ 2.0: a linked dataset of SPARQL query logs (Preprint) (2022). https://aidanhogan.com/docs/lsq-sparql-logs.pdf
Vrandečić, D.: WikiData: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web (WWW 2012), pp. 1063–1064 (2012). https://doi.org/10.1145/2187980.2188242
Wang, J., et al.: Real-time workload pattern analysis for large-scale cloud databases. arXiv e-prints arXiv:2307.02626, July 2023. https://doi.org/10.48550/arXiv.2307.02626
Xie, T., Chandola, V., Kennedy, O.: Query log compression for workload analytics. VLDB Endow. 12(3), 183–196 (2018). https://doi.org/10.14778/3291264.3291265
Acknowledgements
This work was partially supported by the PNRR project “Fostering Open Science in Social Science Research (FOSSR)” (CUP B83C22003950001) and by the PNRR MUR project PE0000013-FAIR.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Asprino, L., Ceriani, M. (2023). How is Your Knowledge Graph Used: Content-Centric Analysis of SPARQL Query Logs. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-47240-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47239-8
Online ISBN: 978-3-031-47240-4
eBook Packages: Computer ScienceComputer Science (R0)