skip to main content
10.1145/3460210.3493565acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

Predicting SPARQL Query Dynamics

Published: 02 December 2021 Publication History

Abstract

Given historical versions of an RDF graph, we propose and compare several methods to predict whether or not the results of a SPARQL query will change for the next version. Unsurprisingly, we find that the best results for this task are achievable by considering the full history of results for the query over previous versions of the graph. However, given a previously unseen query, producing historical results requires costly offline maintenance of previous versions of the data, and costly online computation of the query results over these previous versions. This prompts us to explore more lightweight alternatives that rely on features computed from the query and statistical summaries of historical versions of the graph. We evaluate the quality of the predictions produced over weekly snapshots of Wikidata and daily snapshots of DBpedia. Our results provide insights into the trade-offs for predicting SPARQL query dynamics, where we find that a detailed history of changes for a query's results enables much more accurate predictions, but has higher overhead versus more lightweight alternatives.

References

[1]
Usman Akhtar, Muhammad Bilal Amin, and Sungyoung Lee. 2017. Evaluating scheduling strategies in LOD based application. In Asia-Pacific Network Operations and Management Symposium (APNOMS). IEEE, 255--258.
[2]
Carlos Buil Aranda, Aidan Hogan, Jü rgen Umbrich, and Pierre-Yves Vandenbussche. 2013. SPARQL Web-Querying Infrastructure: Ready for Action?. In International Semantic Web Conference (ISWC), Vol. 8219. Springer, 277--293.
[3]
Shaul Dar, Michael J. Franklin, Bjö rn Þó r Jó nsson, Divesh Srivastava, and Michael Tan. 1996. Semantic Data Caching and Replacement. In International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, 330--341.
[4]
Soheila Dehghanzadeh, Josiane Xavier Parreira, Marcel Karnstedt, Jü rgen Umbrich, Manfred Hauswirth, and Stefan Decker. 2014. Optimizing SPARQL Query Processing on Dynamic and Static Data Based on Query Time/Freshness Requirements Using Materialization. In Joint International Conference on Semantic Technology (JIST). Springer, 257--270.
[5]
Renata Queiroz Dividino, Thomas Gottron, and Ansgar Scherp. 2015. Strategies for Efficiently Keeping Local Linked Open Data Caches Up-To-Date. In International Semantic Web Conference (ISWC). Springer, 356--373.
[6]
Renata Queiroz Dividino, Thomas Gottron, Ansgar Scherp, and Gerd Grö ner. 2014a. From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources. In Dataset PROFIling & fEderated Search (PROFILES) .
[7]
Renata Queiroz Dividino, André Kramer, and Thomas Gottron. 2014b. An Investigation of HTTP Header Information for Detecting Changes of Linked Open Data Sources. In ESWC Satellite Events. Springer, 199--203.
[8]
Renata Queiroz Dividino, Ansgar Scherp, Gerd Grö ner, and Thomas Grotton. 2013. Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not?. In Consuming Linked Data (COLD). CEUR-WS.org.
[9]
Javier D. Ferná ndez, Axel Polleres, and Jü rgen Umbrich. 2015. Towards Efficient Archiving of Dynamic Linked Open Data. In DIACHRON Managing the Evolution and Preservation of the Data Web. 34--49.
[10]
Julien Genestoux, Brad Fitzpatrick, Brett Slatkin, and Martin Atkins. 2018. WebSub . W3C Recommendation . https://www.w3.org/TR/websub/.
[11]
Larry Gonzá lez and Aidan Hogan. 2018. Modelling Dynamics in Semantic Web Knowledge Graphs with Formal Concept Analysis. In World Wide Web Conference (WWW). ACM, 1175--1184.
[12]
Ashish Gupta and Inderpal Singh Mumick. 1995. Maintenance of Materialized Views: Problems, Techniques, and Applications . IEEE Data Eng. Bull., Vol. 18, 2 (1995), 3--18.
[13]
Aidan Hogan. 2015. Skolemising Blank Nodes while Preserving Isomorphism. In World Wide Web Conference (WWW) . ACM, 430--440.
[14]
Tobias K"a fer, Ahmed Abdelrahman, Jü rgen Umbrich, Patrick O'Byrne, and Aidan Hogan. 2013. Observing Linked Data Dynamics. In ESWC. Springer, 213--227.
[15]
Kjetil Kjernsmo. 2015. A Survey of HTTP Caching Implementations on the Open Semantic Web. In European Semantic Web Conference (ESWC). Springer, 286--301.
[16]
Magnus Knuth, Olaf Hartig, and Harald Sack. 2016. Scheduling Refresh Queries for Keeping Results from a SPARQL Endpoint Up-to-Date (Short Paper). In On the Move to Meaningful Internet Systems (OTM) . Springer, 780--791.
[17]
Magnus Knuth, Dinesh Reddy, Anastasia Dimou, Sahar Vahdati, and George Kastrinakis. 2015. Towards Linked Data Update Notifications Reviewing and Generalizing the SparqlPuSH Approach. In Workshop on Negative or Inconclusive Results in Semantic Web (NoISE), Anastasia Dimou, Jacco van Ossenbruggen, Miel Vander Sande, and Maria-Esther Vidal (Eds.), Vol. 1435. CEUR-WS.org.
[18]
Tomas Lampo, Maria-Esther Vidal, Juan Danilow, and Edna Ruckhaus. 2011. To Cache or Not To Cache: The Effects of Warming Cache in Complex SPARQL Queries. In On the Move to Meaningful Internet Systems (OTM). Springer, 716--733.
[19]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sö ren Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia . Semantic Web, Vol. 6, 2 (2015), 167--195.
[20]
Johannes Lorey and Felix Naumann. 2013. Caching and Prefetching Strategies for SPARQL Queries. In ESWC Satellite Events . Springer, 46--65.
[21]
Stanislav Malyshev, Markus Krö tzsch, Larry Gonzá lez, Julius Gonsior, and Adrian Bielefeldt. 2018. Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph. In International Semantic Web Conference (ISWC) . Springer, 376--394.
[22]
Michael Martin, Jö rg Unbehauen, and Sö ren Auer. 2010. Improving the Performance of Semantic Web Applications with SPARQL Query Caching. In Extended Semantic Web Conference (ESWC) . Springer, 304--318.
[23]
Paolo Missier, Pinar Alper, Ó scar Corcho, Ian Dunlop, and Carole A. Goble. 2007. Requirements and Services for Metadata Management. IEEE Internet Computing, Vol. 11, 5 (2007), 17--25.
[24]
Mohamed Morsey, Jens Lehmann, Sö ren Auer, Claus Stadler, and Sebastian Hellmann. 2012. DBpedia and the live extraction of structured data from Wikipedia . Program, Vol. 46, 2 (2012), 157--181.
[25]
Alberto Moya Loustaunau and Aidan Hogan. 2019. Estimating the Dynamics of SPARQL Query Results Using Binary Classification. In Querying and Benchmarking the Web of Data (QuWeDa). 5--20.
[26]
Alberto Moya Loustaunau and Aidan Hogan. 2021. Online material . GitHub Repository . https://github.com/amoya87/sparqldynamics/.
[27]
Sebastian Neumaier and Jü rgen Umbrich. 2016. Measures for Assessing the Data Freshness in Open Data Portals. In International Conference on Open and Big Data (OBD). IEEE Computer Society, 17--24.
[28]
Chifumi Nishioka and Ansgar Scherp. 2016. Information-theoretic Analysis of Entity Dynamics on the Linked Open Data Cloud. In Dataset PROFIling and fEderated Search for Linked Data (PROFILES) . CEUR-WS.org.
[29]
Chifumi Nishioka and Ansgar Scherp. 2017. Keeping Linked Open Data caches up-to-date by predicting the life-time of RDF triples. In International Conference on Web Intelligence (WI). ACM, 73--80.
[30]
Nikolaos Papailiou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. 2015. Graph-Aware, Workload-Adaptive SPARQL Query Caching. In SIGMOD International Conference on Management of Data. ACM, 1777--1792.
[31]
Alexandre Passant and Pablo N. Mendes. 2010. sparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub. In Scripting and Development for the Semantic Web. CEUR-WS.org.
[32]
Jorge Pé rez, Marcelo Arenas, and Claudio Gutié rrez. 2009. Semantics and complexity of SPARQL . ACM Trans. Database Syst., Vol. 34, 3 (2009), 16:1--16:45.
[33]
Qun Ren, Margaret H. Dunham, and Vijay Kumar. 2003. Semantic Caching and Query Processing . IEEE TKDE, Vol. 15, 1 (2003), 192--210.
[34]
Mariano Rico, Rizkallah Touma, Anna Queralt, and Mar'i a S. Pé rez. 2018. Machine Learning-based Query Augmentation for SPARQL Endpoints. In Web Information Systems and Technologies (WEBIST). SciTePress, 57--67.
[35]
Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. LSQ: The Linked SPARQL Queries Dataset. In International Semantic Web Conference (ISWC) . Springer, 261--269.
[36]
Heiner Stuckenschmidt. 2004. Similarity-Based Query Caching. In International Conference on Flexible Query Answering Systems (FQAS). Springer, 295--306.
[37]
Sebastian Tramp, Philipp Frischmuth, Timofey Ermilov, and Sö ren Auer. 2010. Weaving a Social Data Web with Semantic Pingback. In Knowledge Engineering and Management (EKAW) . Springer, 135--149.
[38]
Jü rgen Umbrich, Michael Hausenblas, Aidan Hogan, Axel Polleres, and Stefan Decker. 2010a. Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources. In Linked Data on the Web (LDOW). CEUR-WS.org.
[39]
Jü rgen Umbrich, Marcel Karnstedt, Aidan Hogan, and Josiane Xavier Parreira. 2012. Hybrid SPARQL Queries: Fresh vs. Fast Results. In International Semantic Web Conference (ISWC) . Springer, 608--624.
[40]
Jü rgen Umbrich, Marcel Karnstedt, and Sebastian Land. 2010b. Towards Understanding the Changing Web: Mining the Dynamics of Linked-Data Sources and Entities. In Lernen, Wissen & Adaptivitat (LWA) . 159--162.
[41]
Jü rgen Umbrich, Nina Mrzelj, and Axel Polleres. 2015. Towards capturing and preserving changes on the Web of Data. In Managing the Evolution and Preservation of the Data Web (DIACHRON). 50--65.
[42]
Denny Vrandecic and Markus Krö tzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, Vol. 57, 10 (2014), 78--85.
[43]
Gregory Todd Williams and Jesse Weaver. 2011. Enabling Fine-Grained HTTP Caching of SPARQL Query Results. In International Semantic Web Conference (ISWC). Springer, 762--777.
[44]
Gang Wu and Mengdong Yang. 2012. Improving SPARQL query performance with algebraic expression tree based caching and entity caching. J. Zhejiang Univ. Sci. C, Vol. 13, 4 (2012), 281--294.
[45]
Wei Emma Zhang, Quan Z. Sheng, Kerry Taylor, and Yongrui Qin. 2015. Identifying and Caching Hot Triples for Efficient RDF Query Processing. In Database Systems for Advanced Applications (DASFAA). Springer, 259--274.

Cited By

View all
  • (2024)LSQ 2.0: A linked dataset of SPARQL query logsSemantic Web10.3233/SW-22301515:1(167-189)Online publication date: 12-Jan-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
K-CAP '21: Proceedings of the 11th Knowledge Capture Conference
December 2021
300 pages
ISBN:9781450384575
DOI:10.1145/3460210
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamics
  2. linked data
  3. rdf
  4. sparql

Qualifiers

  • Research-article

Funding Sources

  • ANID
  • FONDECYT
  • CONICYT

Conference

K-CAP '21
Sponsor:
K-CAP '21: Knowledge Capture Conference
December 2 - 3, 2021
Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LSQ 2.0: A linked dataset of SPARQL query logsSemantic Web10.3233/SW-22301515:1(167-189)Online publication date: 12-Jan-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media