Skip to main content
Log in

A cache-based method to improve query performance of linked Open Data cloud

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

The proliferation of semantic big data has resulted in a large amount of content published over the Linked Open Data (LOD) cloud. Semantic Web applications consume these data by issuing SPARQL queries. One of the main challenges faced by querying the LOD web cloud on account of the inherent distributed nature of LOD is its high search latency and lack of tools to connect the SPARQL endpoints. In this paper, we propose an Adaptive Cache Replacement strategy (ACR) that aims to accelerate the overall query processing of the LOD cloud. ACR alleviates the burden on SPARQL endpoints by identifying subsequent queries learned from clients historical query patterns and caching the result of these queries. For cache replacement, we propose an exponential smoothing forecasting method to replace the less valuable cache content. In the experimental study, we evaluate the performance of the proposed approach in terms of hit rates, query time and overhead. The proposed approach is found to outperform existing state-of-the-art approaches, increase hit rates by 5.46%, and reduce the query times by 6.34%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://linkedgeodata.org/.

  2. https://www.w3.org/RDF/.

  3. https://www.w3.org/TR/rdf-sparql-query/.

  4. https://wiki.dbpedia.org/.

  5. https://www.w3.org/TR/sparql11-overview/.

  6. http://www.w3.org/TR/rdf-sparql-query/.

  7. http://usewod.org/usewod2014.html.

  8. https://jena.apache.org/.

References

  1. Basu A (2019) Semantic web, ontology, and linked data. In: Web services: concepts, methodologies, tools, and applications, IGI Global, pp 127–148

  2. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43

    Article  Google Scholar 

  3. Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Semant Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  4. Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 1247–1250

  5. Cho J, Garcia-Molina H (2003) Estimating frequency of change. ACM Trans Internet Technol 3(3):256–290

    Article  Google Scholar 

  6. Chun S, Jung J, Lee KH (2019) Proactive policy for efficiently updating join views on continuous queries over data streams and linked data. IEEE Access 7:86226–86241

    Article  Google Scholar 

  7. Dar S, Franklin MJ, Jonsson BT, Srivastava D, Tan M et al (1996) Semantic data caching and replacement. VLDB 96:330–341

    Google Scholar 

  8. Denning PJ (1968) The working set model for program behavior. Commun ACM 11(5):323–333

    Article  MathSciNet  Google Scholar 

  9. Dividino RQ, Gröner G (2013) Which of the following SPARQL queries are similar? why? In: LD4IE@ ISWC

  10. Fernández JD, Umbrich J, Polleres A, Knuth M (2019) Evaluating query and storage strategies for RDF archives. Semant Web 10(2):247–291

    Article  Google Scholar 

  11. Gardner ES Jr (2006) Exponential smoothing: the state of the art–part ii. Int J Forecast 22(4):637–666

    Article  Google Scholar 

  12. Godfrey P, Gryz J (1999) Answering queries by semantic caches. In: International conference on database and expert systems applications, Springer, pp 485–498

  13. Gottron T (2016) Measuring the accuracy of linked data indices. arXiv preprint arXiv:1603.06068

  14. Gottron T, Knauf M, Scherp A (2015) Analysis of schema structures in the linked open data graph based on unique subject uris, pay-level domains, and vocabulary usage. Distrib Parallel Databases 33(4):515–553

    Article  Google Scholar 

  15. Hasan R (2014) Predicting SPARQL query performance and explaining linked data. In: European semantic web conference, Springer, pp 795–805

  16. Jelenković P, Radovanović A (2003) Optimizing the LRU algorithm for web caching. Charzinski J, Lehnert R, Tran-Gia P (eds) Teletraffic science and engineering, vol 5. Elsevier, pp 191–200, ISSN 1388–3437, ISBN 9780444514554

  17. Konrath M, Gottron T, Staab S, Scherp A (2012) Schemex–efficient construction of a data catalogue by stream-based indexing of linked data. Web Semant Sci Serv Agents World Wide Web 16:52–58

    Article  Google Scholar 

  18. Lee D, Choi J, Kim JH, Noh SH, Min SL, Cho Y, Kim CS (2001) LRFU: a spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans Comput 50(12):1352–1361

    Article  MathSciNet  Google Scholar 

  19. Lehmann J, Bühmann L (2011) Autosparql: let users query your knowledge base. In: Extended semantic web conference, Springer, pp 63–79

  20. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195

    Article  Google Scholar 

  21. Levandoski JJ, Larson PÅ, Stoica R (2013) Identifying hot and cold data in main-memory databases. In: 2013 IEEE 29th international conference on data engineering (ICDE), IEEE, pp 26–37

  22. Lorey J, Naumann F (2013) Caching and prefetching strategies for SPARQL queries. In: Extended semantic web conference, Springer, pp 46–65

  23. Lorey J, Naumann F (2013) Detecting SPARQL query templates for data prefetching. In: Extended semantic web conference, Springer, pp 124–139

  24. Martin M, Unbehauen J, Auer S (2010) Improving the performance of semantic web applications with SPARQL query caching. In: Extended semantic web conference, Springer, pp 304–318

  25. Nishioka C, Scherp A (2017) Keeping linked open data caches up-to-date by predicting the life-time of RDF triples. In: Proceedings of the international conference on web intelligence, ACM, pp 73–80

  26. Papailiou N, Tsoumakos D, Karras P, Koziris N (2015) Graph-aware, workload-adaptive SPARQL query caching. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, pp 1777–1792

  27. Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341

    Article  Google Scholar 

  28. Podlipnig S, Böszörmenyi L (2003) A survey of web cache replacement strategies. ACM Comput Surv 35(4):374–398

    Article  Google Scholar 

  29. Ren Q, Dunham MH, Kumar V (2003) Semantic caching and query processing. IEEE Trans Knowl Data Eng 15(1):192–210

    Article  Google Scholar 

  30. Sanfeliu A, Fu KS (1983) A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern 3:353–362

    Article  Google Scholar 

  31. Shu Y, Compton M, Müller H, Taylor K (2013) Towards content-aware SPARQL query caching for semantic web applications. In: International conference on web information systems engineering, Springer, pp 320–329

  32. Suchanek FM, Kasneci G, Weikum G (2007) YAGO: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 697–706

  33. Umbrich J, Karnstedt M, Hogan A, Parreira JX (2012) Hybrid SPARQL queries: fresh versus fast results. In: International semantic web conference, Springer, pp 608–624

  34. Yan L, Ma R, Li D, Cheng J (2017) RDF approximate queries based on semantic similarity. Computing 99(5):481–491

    Article  MathSciNet  Google Scholar 

  35. Yang M, Wu G (2011) Caching intermediate result of SPARQL queries. In: Proceedings of the 20th international conference companion on World wide web, ACM, pp 159–160

  36. Zhang WE, Sheng QZ, Qin Y, Yao L, Shemshadi A, Taylor K (2016) SECF: Improving SPARQL querying performance with proactive fetching and caching. In: Proceedings of the 31st annual ACM symposium on applied computing, ACM, pp 362–367

  37. Zhang WE, Sheng QZ, Taylor K, Qin Y (2015) Identifying and caching hot triples for efficient RDF query processing. In: International conference on database systems for advanced applications, Springer, pp 259–274

  38. Zhang WE, Sheng QZ, Yao L, Taylor K, Shemshadi A, Qin Y (2018) A learning-based framework for improving querying on web interfaces of curated knowledge bases. ACM Trans Internet Technol 18(3):35

    Google Scholar 

  39. Zheng W, Zou L, Peng W, Yan X, Song S, Zhao D (2016) Semantic SPARQL similarity search over RDF knowledge graphs. Proc VLDB Endow 9(11):840–851

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-0-01629) supervised by the IITP (Institute for Information & communications Technology Promotion). This work was supported by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2017-0-00655), NRF-2016K1A3A7A03951968 & NRF-2019R1A2C2090504.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jaehun Bang or Sungyoung Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akhtar, U., Sant’Anna, A., Jihn, CH. et al. A cache-based method to improve query performance of linked Open Data cloud. Computing 102, 1743–1763 (2020). https://doi.org/10.1007/s00607-020-00814-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-020-00814-9

Keywords

Mathematics Subject Classification

Navigation