How is Your Knowledge Graph Used: Content-Centric Analysis of SPARQL Query Logs

Asprino, Luigi; Ceriani, Miguel

doi:10.1007/978-3-031-47240-4_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14265))

Included in the following conference series:

International Semantic Web Conference

2227 Accesses

Abstract

Knowledge graphs (KGs) are used to integrate and persist information useful to organisations, communities, or the general public. It is essential to understand how KGs are used so as to evaluate the strengths and shortcomings of semantic web standards, data modelling choices formalised in ontologies, deployment settings of triple stores etc. One source of information on the usage of the KGs is the query logs, but making sense of hundreds of thousands of log entries is not trivial. Previous works that studied available logs from public SPARQL endpoints mainly focused on the general syntactic properties of the queries disregarding the semantics and their intent. We introduce a novel, content-centric, approach that we call query log summarisation, in which we group the queries that can be derived from some common pattern. The type of patterns considered in this work is query templates, i.e. common blueprints from which multiple queries can be generated by the replacement of parameters with constants. Moreover, we present an algorithm able to summarise a query log as a list of templates whose time and space complexity is linear with respect to the size of the input (number and dimension of queries). We experimented with the algorithm on the query logs of the Linked SPARQL Queries dataset showing promising results.

An extended version of this paper (pre-print) is available at https://doi.org/10.6084/m9.figshare.23751243.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://usewod.org/workshops.html.
2.
Using the initial underscore in the variable name to identify parameters matches with existing practice [27], while using “$” visually helps distinguish the parameters from query variables that often start with “?”.
3.
For brevity, the queries omit prefix declarations:
- dbr: <http://dbpedia.org/resource/>
- rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
- foaf: <http://xmlns.com/foaf/0.1/>
- dbo: <https://dbpedia.org/ontology/>
.
4.
It is for example a recommended way to perform query federation [34].
5.
https://github.com/RubenVerborgh/SPARQL.js.
6.
https://jena.apache.org/documentation/fuseki2.
7.
https://github.com/miguel76/sparql-clustering.
8.
http://lsq.aksw.org/.
9.
In the table, for conciseness, the statistics of the Bio2RDF endpoints are shown only aggregated for the whole project. In Appendix B the extended version of the paper there is a more detailed version of the table showing the statistics endpoint by endpoint..
10.
This choice is motivated by the fact that the Bio2RDF endpoints are part of the same project, the collected logs refer roughly to the same period, and there is considerable overlap in the clients querying the endpoints.
11.
With the exception of the Bio2RDF endpoints, which are considered as a whole.
12.
One counts the triples in which one resource is subject and the other object, the other counts the triples in which they replace each other or have symmetric role.
13.
https://doi.org/110.6084/m9.figshare.23751138.
14.
http://lsq.aksw.org/.

References

Aljaloud, S., Luczak-Rösch, M., Chown, T., Gibbins, N.: Get all, filter details-on the use of regular expressions in SPARQL queries. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2014) (2014)
Google Scholar
Arias, M., Fernandez, J.D., Martinez-Prieto, M.A., de la Fuente, P.: An empirical study of real-world SPARQL queries. In: Proceedings of Usage Analysis and the Web of Data (USEWOD 2011) (2011)
Google Scholar
Asprino, L., Basile, V., Ciancarini, P., Presutti, V.: Empirical analysis of foundational distinctions in linked open data. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI 2018), pp. 3962–3969 (2018). https://doi.org/10.24963/ijcai.2018/551
Asprino, L., Beek, W., Ciancarini, P., van Harmelen, F., Presutti, V.: Observing LOD using equivalent set graphs: it is mostly flat and sparsely linked. In: Ghidini, C., et al. (eds.) ISWC 2019, Part I. LNCS, vol. 11778, pp. 57–74. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_4
Chapter Google Scholar
Asprino, L., Carriero, V.A., Presutti, V.: Extraction of common conceptual components from multiple ontologies. In: Proceedings of the International Conference on Knowledge Capture (K-CAP 2021), pp. 185–192 (2021). https://doi.org/10.1145/3460210.3493542
Asprino, L., Presutti, V.: Observing LOD: its knowledge domains and the varying behavior of ontologies across them. IEEE Access 11, 21127–21143 (2023). https://doi.org/10.1109/ACCESS.2023.3250105
Article Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)
Article Google Scholar
Bielefeldt, A., Gonsior, J., Krötzsch, M.: Practical linked data access via SPARQL: the case of wikidata. In: Proceedings of the Workshop on Linked Data on the Web co-located with the Web Conference (LDOW@WWW 2018) (2018)
Google Scholar
Bonifati, A., Martens, W., Timm, T.: Navigating the maze of wikidata query logs. In: Proceedings of The Web Conference (WWW 2019), pp. 127–138 (2019). https://doi.org/10.1145/3308558.3313472
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 29(2–3), 655–679 (2020). https://doi.org/10.1007/s00778-019-00558-9
Article Google Scholar
Chekol, M.W., Euzenat, J., Genevès, P., Layaïda, N.: SPARQL query containment under SHI axioms. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2012) (2012)
Google Scholar
Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 Concepts and Abstract Syntax. http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
Deep, S., Gruenheid, A., Koutris, P., Viglas, S., Naughton, J.F.: Comprehensive and efficient workload summarization. Datenbank-Spektrum 22(3), 249–256 (2022). https://doi.org/10.1007/s13222-022-00427-w
Article Google Scholar
Haklay, M., Weber, P.: Openstreetmap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008). https://doi.org/10.1109/MPRV.2008.80
Article Google Scholar
Han, X., Feng, Z., Zhang, X., Wang, X., Rao, G., Jiang, S.: On the statistical analysis of practical SPARQL queries. In: Proceedings of the 19th International Workshop on Web and Databases (2016). https://doi.org/10.1145/2932194.2932196
Harris, S., et al.: SPARQL 1.1 Query Language. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/
Hartig, O.: Provenance information in the web of data. In: Proceedings of the Workshop on Linked Data on the Web (LDOW 2009) (2009). http://ceur-ws.org/Vol-538/ldow2009_paper18.pdf
Hoxha, J., Junghans, M., Agarwal, S.: Enabling semantic analysis of user browsing patterns in the web of data. In: Proceedings of Usage Analysis and the Web of Data (USEWOD 2012) (2012)
Google Scholar
Huelss, J., Paulheim, H.: What SPARQL query logs tell and do not tell about semantic relatedness in LOD - or: the unsuccessful attempt to improve the browsing experience of DBPedia by exploiting query logs. In: Proceedings of ESWC 2015, Revised Selected Papers, pp. 297–308 (2015). https://doi.org/10.1007/978-3-319-25639-9_44
Kamra, A., Terzi, E., Bertino, E.: Detecting anomalous access patterns in relational databases. VLDB J. 17(5), 1063–1077 (2008). https://doi.org/10.1007/s00778-007-0051-4
Article Google Scholar
Kul, G., et al.: Summarizing large query logs in Ettu. CoRR (2016). http://arxiv.org/abs/1608.01013
Lebo, T., Sahoo, S., McGuinness, D.: PROV-O: The PROV Ontology. https://www.w3.org/TR/2013/REC-prov-o-20130430/
Luczak-Rösch, M., Bischoff, M.: Statistical analysis of web of data usage. In: Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn 2011) (2011)
Google Scholar
Luczak-Rösch, M., Hollink, L., Berendt, B.: Current directions for usage analysis and the web of data: the diverse ecosystem of web of data access mechanisms. In: Proceedings of the 25th International Conference on World Wide Web (WWW 2016), pp. 885–887 (2016). https://doi.org/10.1145/2872518.2891068
Mathew, S., Petropoulos, M., Ngo, H.Q., Upadhyaya, S.J.: A data-centric approach to insider attack detection in database systems. In: Proceedings of the 13th International Symposium on Recent Advances in Intrusion (RAID 2010), pp. 382–401 (2010). https://doi.org/10.1007/978-3-642-15512-3_20
Meroño-Peñuela, A., Hoekstra, R.: grlc makes GitHub taste like linked data APIs. In: Proceedings of ESWC 2016, pp. 342–353 (2016)
Google Scholar
Microsoft: Automatic Tuning - Microsoft SQL Server. https://learn.microsoft.com/en-us/sql/relational-databases/automatic-tuning/automatic-tuning?view=sql-server-ver16
Möller, K., Hausenblas, M., Cyganiak, R., Handschuh, S.: Learning from linked open data usage: patterns & metrics. In: Proceedings of the Web Science Conference (2010)
Google Scholar
Möller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food - the ESWC and ISWC metadata projects. In: Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, ISWC-ASWC 2007, pp. 802–815 (2007). https://doi.org/10.1007/978-3-540-76298-0_58
Oracle: Automatic Indexing - Oracle SQL Developer Web. https://docs.oracle.com/en/database/oracle/sql-developer-web/19.2.1/sdweb/automatic-indexing-page.html#GUID-8198E146-1D87-4541-8EC0-56ABBF52B438
Picalausa, F., Vansummeren, S.: What are real SPARQL queries like? In: Proceedings of the International Workshop on Semantic Web Information Management (SWIM 2011) (2011). https://doi.org/10.1145/1999299.1999306
Pichler, R., Skritek, S.: Containment and equivalence of well-designed SPARQL. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2014), pp. 39–50 (2014). https://doi.org/10.1145/2594538.2594542
Prud’hommeaux, E., Buil-Aranda, C.: SPARQL 1.1 Federated Query. http://www.w3.org/TR/2013/REC-sparql11-federated-query-20130321/
Raghuveer, A.: Characterizing machine agent behavior through SPARQL query mining. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2012) (2012)
Google Scholar
Rietveld, L., Hoekstra, R., et al.: Man vs. machine: differences in SPARQL queries. In: Proceedings of the Workshop on Usage Analysis and the Web of Data (USEWOD 2014) (2014)
Google Scholar
Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015, Part II. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15
Chapter Google Scholar
Schoenfisch, J., Stuckenschmidt, H.: Analyzing real-world SPARQL queries and ontology-based data access in the context of probabilistic data. Int. J. Approx. Reason. 90, 374–388 (2017). https://doi.org/10.1016/j.ijar.2017.08.005
Article MathSciNet MATH Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Article MathSciNet MATH Google Scholar
Stadler, C., Lehmann, J., Höffner, K., Auer, S.: LinkedGeoData: a core for a web of spatial open data. Semant. Web 3(4), 333–354 (2012). https://doi.org/10.3233/SW-2011-0052
Article Google Scholar
Stadler, C., et al.: LSQ 2.0: a linked dataset of SPARQL query logs (Preprint) (2022). https://aidanhogan.com/docs/lsq-sparql-logs.pdf
Vrandečić, D.: WikiData: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web (WWW 2012), pp. 1063–1064 (2012). https://doi.org/10.1145/2187980.2188242
Wang, J., et al.: Real-time workload pattern analysis for large-scale cloud databases. arXiv e-prints arXiv:2307.02626, July 2023. https://doi.org/10.48550/arXiv.2307.02626
Xie, T., Chandola, V., Kennedy, O.: Query log compression for workload analytics. VLDB Endow. 12(3), 183–196 (2018). https://doi.org/10.14778/3291264.3291265

Download references

Acknowledgements

This work was partially supported by the PNRR project “Fostering Open Science in Social Science Research (FOSSR)” (CUP B83C22003950001) and by the PNRR MUR project PE0000013-FAIR.

Author information

Authors and Affiliations

University of Bologna, Via Zamboni 33, Bologna, Italy
Luigi Asprino
University of Bari Aldo Moro, Via Orabona 4, Bari, Italy
Miguel Ceriani
ISTC-CNR, Via S. Martino della Battaglia 44, Roma, Italy
Miguel Ceriani

Authors

Luigi Asprino
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Ceriani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Ceriani .

Editor information

Editors and Affiliations

University of Liverpool, Liverpool, UK
Terry R. Payne
University of Bologna, Bologna, Italy
Valentina Presutti
Southeast University, Nanjing, China
Guilin Qi
Universidad Politécnica de Madrid, Madrid, Spain
María Poveda-Villalón
Huawei Technologies R&D UK, Edinburgh, UK
Giorgos Stoilos
Centrum Wiskunde and Informatica, Amsterdam, The Netherlands
Laura Hollink
IT University of Copenhagen, Copenhagen, Denmark
Zoi Kaoudi
Nanjing University, Nanjing, China
Gong Cheng
Tsinghua University, Beijing, Beijing, China
Juanzi Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asprino, L., Ceriani, M. (2023). How is Your Knowledge Graph Used: Content-Centric Analysis of SPARQL Query Logs. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-47240-4_11
Published: 27 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47239-8
Online ISBN: 978-3-031-47240-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)

How is Your Knowledge Graph Used: Content-Centric Analysis of SPARQL Query Logs