Skip to main content
Log in

Explaining similarity for SPARQL queries

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Knowledge graph has gained significant popularity in recent years. As one of the W3C standards, SPARQL has become the de facto standard query language to retrieve the desired data from various knowledge graphs on the Web. Therefore, accurately measuring the similarity between different SPARQL queries is an important and fundamental task for many query-based applications, such as query suggestion, query rewriting, and query relaxation. However, conventional SPARQL similarity computation models only provide poorly-interpretable results, i,e., simple similarity scores for pairs of queries. Explaining the computed similarity scores will lead to an outcome of explaining why a specific computation model offers such scores. This helps users and machines understand the result of similarity measures in different query scenarios and can be used in many downstream tasks. We thus focus on providing explanations for typical SPARQL similarity measures in this paper. Specifically, given similarity scores of existing measures, we implement four explainable models based on Linear Regression, Support Vector Regression, Ridge Regression, and Random Forest Regression to provide quantitative weights to different dimensional SPARQL features, i.e., our models are able to explain different kinds of SPARQL similarity computation models by presenting the weights of different dimensional SPARQL features captured by them. Deep insight analysis and extensive experiments on real-world datasets are conducted to illustrate the effectiveness of our explainable models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. We append a ‘1’ to the end of the vector to avoid all-zero vectors.

  2. We append a 1 to the end of the vector to avoid all-zero vectors.

  3. https://en.wikipedia.org/wiki/Covariance

  4. https://en.wikipedia.org/wiki/Pearson_correlation_coeficient

  5. https://en.wikipedia.org/wiki/Spearman’s_rank_correlation_coefficient

  6. https://en.wikipedia.org/wiki/Variance_inflation_factor

  7. https://en.wikipedia.org/wiki/Linear_regression

  8. https://en.wikipedia.org/wiki/Random_forest

  9. https://en.wikipedia.org/wiki/Support_vector_machine#Regression

  10. https://en.wikipedia.org/wiki/Tikhonov_regularization

References

  1. Allocca, C., Adamou, A., d’Aquin, M., Motta, E.: Sparql query recommendations by example. In: European Semantic Web Conference, pp. 128–133. Springer (2016)

  2. Bielefeldt, A., Gonsior, J., Krötzsch, M.: Practical linked data access via Sparql: The Case of Wikidata. In: LDOW@ WWW (2018)

  3. Bonifati, A., Martens, W., Timm, T.: An analytical study of large sparql query logs. arXiv:1708.00363 (2017)

  4. Devlin, J., Chang, M. W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)

  5. Dividino, R. Q., Gröner, G.: Which of the following Sparql queries are similar? Why?. In: LD4IE@ ISWC (2013)

  6. Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Statist. 24(1), 44–65 (2015)

    Article  MathSciNet  Google Scholar 

  7. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Computing Surveys (CSUR) 51(5), 1–42 (2018)

    Article  Google Scholar 

  8. Harris, S., Seaborne, A., Prud’hommeaux, E.: Sparql 1.1 query language. W3C Recommendation 21(10), 778 (2013)

    Google Scholar 

  9. Hoerl, A. E., Kennard, R. W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

    Article  Google Scholar 

  10. Hogan, A., Mellotte, M., Powell, G., Stampouli, D.: Towards fuzzy query-relaxation for Rdf. In: Extended Semantic Web Conference, pp. 687–702. Springer (2012)

  11. Hurtado, C. A., Poulovassilis, A., Wood, P. T.: Query relaxation in Rdf. In: Journal on Data Semantics X, pp. 31–61. Springer (2008)

  12. Khan, A., Wu, Y., Aggarwal, C. C., Yan, X.: Nema: Fast graph search with label similarity. Proc. VLDB Endow. 6(3), 181–192 (2013)

    Article  Google Scholar 

  13. Kiefer, C., Bernstein, A., Stocker, M.: The fundamentals of Isparql: A virtual triple approach for similarity-based semantic Web tasks. In: The Semantic Web, pp. 295–309. Springer (2007)

  14. Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for Sparql. In: 2012 IEEE 28Th International Conference on Data Engineering, pp. 666–677. IEEE (2012)

  15. Lehmann, J., Bühmann, L.: Autosparql: Let users query your knowledge base. In: Extended Semantic Web Conference, pp. 63–79. Springer (2011)

  16. Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R news 2(3), 18–22 (2002)

    Google Scholar 

  17. Lorey, J., Naumann, F.: Detecting Sparql query templates for data prefetching. In: Extended Semantic Web Conference, pp. 124–139. Springer (2013)

  18. Molnar, C.: Interpretable Machine Learning. Lulu com (2020)

  19. Morsey, M., Lehmann, J., Auer, S., Ngomo, A. C. N.: Dbpedia sparql benchmark–performance assessment with real queries on real data. In: International Semantic Web Conference, pp. 454–469. Springer (2011)

  20. Morsey, M., Lehmann, J., Auer, S., Ngomo, A. C. N.: Usage-Centric Benchmarking of Rdf triple stores. In: AAAI. Citeseer (2012)

  21. Peake, G., Wang, J.: Explanation mining: Post Hoc interpretability of latent factor models for recommendation systems. In: The 24Th ACM SIGKDD International Conference (2018)

  22. Quinlan, J. R.: Induction on decision tree. Mach. Learn. 1 (1986)

  23. Raghuveer, A.: Characterizing machine agent behavior through sparql query mining. In: Proceedings of the International Workshop on Usage Analysis and the Web of Data, Lyon, France (2012)

  24. Reddy, B. K., Kumar, P. S.: Efficient approximate sparql querying of Web of linked data. URSW 654, 37–48 (2010)

    Google Scholar 

  25. Ribeiro, M. T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: Explaining the predictions of any classifier. In: The 22Nd ACM SIGKDD International Conference (2016)

  26. Saleem, M., Ali, M. I., Hogan, A., Mehmood, Q., Ngomo, A. C. N.: Lsq: the Linked Sparql Queries Dataset. In: International Semantic Web Conference, pp. 261–269. Springer (2015)

  27. Saleem, M., Szárnyas, G., Conrads, F., Bukhari, S. A. C., Mehmood, Q., Ngonga Ngomo, A. C.: How Representative is a Sparql Benchmark? an Analysis of Rdf Triplestore Benchmarks. Thewebconf, pp. 1623–1633 (2019)

  28. Seber, G. A., Lee, A. J.: Linear regression analysis. vol. 329 John Wiley & Sons (2012)

  29. Smola, A. J., Schölkopf, B.: A tutorial on support vector regression. Statist. Comput. 14(3), 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  30. Torre-Bastida, A. I., Bermúdez, J., Illarramendi, A.: Estimating query rewriting quality over lod. Semantic Web 10(3), 529–554 (2019)

    Article  Google Scholar 

  31. Wang, C., Zhang, X.: Q-bert: a bert-based framework for computing sparql similarity in natural language. In: Companion Proceedings of the Web Conference 2020, pp. 65–66 (2020)

  32. Wang, M., Wang, R., Liu, J., Chen, Y., Zhang, L., Qi, G.: Towards Empty Answers in Sparql: Approximating Querying with Rdf Embedding. In: International Semantic Web Conference, pp. 513–529. Springer (2018)

  33. Zeng, J., Ustun, B., Rudin, C.: Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society)ss (2017)

  34. Zhang, W. E., Sheng, Q. Z., Qin, Y., Yao, L., Shemshadi, A., Taylor, K.: Secf: Improving sparql querying performance with proactive fetching and caching. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 362–367 (2016)

  35. Zhang, X., Wang, M., Saleem, M., Ngomo, A. C. N., Qi, G., Wang, H.: Revealing secrets in sparql session level. arXiv:2009.06625 (2020)

  36. Zheng, W., Zou, L., Peng, W., Yan, X., Song, S., Zhao, D.: Semantic sparql similarity search over rdf knowledge graphs. Proc. VLDB Endow. 9(11), 840–851 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China with Grant No. 61906037; CCF-Tencent Open Fund; CCF-Baidu Open Fund with No. CCF BAIDU OF2020003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meng Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Explainability in the Web

Guest Editors: Guandong Xu, Hongzhi Yin, Irwin King, and Lin Li

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, M., Chen, K., Xiao, G. et al. Explaining similarity for SPARQL queries. World Wide Web 24, 1813–1835 (2021). https://doi.org/10.1007/s11280-021-00886-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00886-3

Keywords

Navigation