Explaining similarity for SPARQL queries

Wang, Meng; Chen, Kefei; Xiao, Gang; Zhang, Xinyue; Chen, Hongxu; Wang, Sen

doi:10.1007/s11280-021-00886-3

Explaining similarity for SPARQL queries

Published: 12 June 2021

Volume 24, pages 1813–1835, (2021)
Cite this article

World Wide Web Aims and scope Submit manuscript

Meng Wang^1,2,
Kefei Chen¹,
Gang Xiao³,
Xinyue Zhang⁴,
Hongxu Chen⁵ &
…
Sen Wang⁶

440 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Knowledge graph has gained significant popularity in recent years. As one of the W3C standards, SPARQL has become the de facto standard query language to retrieve the desired data from various knowledge graphs on the Web. Therefore, accurately measuring the similarity between different SPARQL queries is an important and fundamental task for many query-based applications, such as query suggestion, query rewriting, and query relaxation. However, conventional SPARQL similarity computation models only provide poorly-interpretable results, i,e., simple similarity scores for pairs of queries. Explaining the computed similarity scores will lead to an outcome of explaining why a specific computation model offers such scores. This helps users and machines understand the result of similarity measures in different query scenarios and can be used in many downstream tasks. We thus focus on providing explanations for typical SPARQL similarity measures in this paper. Specifically, given similarity scores of existing measures, we implement four explainable models based on Linear Regression, Support Vector Regression, Ridge Regression, and Random Forest Regression to provide quantitative weights to different dimensional SPARQL features, i.e., our models are able to explain different kinds of SPARQL similarity computation models by presenting the weights of different dimensional SPARQL features captured by them. Deep insight analysis and extensive experiments on real-world datasets are conducted to illustrate the effectiveness of our explainable models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases

Interactive SPARQL query formulation using provenance

Article 13 September 2023

On the Marriage of SPARQL and Keywords

Notes

We append a ‘1’ to the end of the vector to avoid all-zero vectors.
We append a 1 to the end of the vector to avoid all-zero vectors.
https://en.wikipedia.org/wiki/Covariance
https://en.wikipedia.org/wiki/Pearson_correlation_coeficient
https://en.wikipedia.org/wiki/Spearman’s_rank_correlation_coefficient
https://en.wikipedia.org/wiki/Variance_inflation_factor
https://en.wikipedia.org/wiki/Linear_regression
https://en.wikipedia.org/wiki/Random_forest
https://en.wikipedia.org/wiki/Support_vector_machine#Regression
https://en.wikipedia.org/wiki/Tikhonov_regularization

References

Allocca, C., Adamou, A., d’Aquin, M., Motta, E.: Sparql query recommendations by example. In: European Semantic Web Conference, pp. 128–133. Springer (2016)
Bielefeldt, A., Gonsior, J., Krötzsch, M.: Practical linked data access via Sparql: The Case of Wikidata. In: LDOW@ WWW (2018)
Bonifati, A., Martens, W., Timm, T.: An analytical study of large sparql query logs. arXiv:1708.00363 (2017)
Devlin, J., Chang, M. W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Dividino, R. Q., Gröner, G.: Which of the following Sparql queries are similar? Why?. In: LD4IE@ ISWC (2013)
Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Statist. 24(1), 44–65 (2015)
Article MathSciNet Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Computing Surveys (CSUR) 51(5), 1–42 (2018)
Article Google Scholar
Harris, S., Seaborne, A., Prud’hommeaux, E.: Sparql 1.1 query language. W3C Recommendation 21(10), 778 (2013)
Google Scholar
Hoerl, A. E., Kennard, R. W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
Hogan, A., Mellotte, M., Powell, G., Stampouli, D.: Towards fuzzy query-relaxation for Rdf. In: Extended Semantic Web Conference, pp. 687–702. Springer (2012)
Hurtado, C. A., Poulovassilis, A., Wood, P. T.: Query relaxation in Rdf. In: Journal on Data Semantics X, pp. 31–61. Springer (2008)
Khan, A., Wu, Y., Aggarwal, C. C., Yan, X.: Nema: Fast graph search with label similarity. Proc. VLDB Endow. 6(3), 181–192 (2013)
Article Google Scholar
Kiefer, C., Bernstein, A., Stocker, M.: The fundamentals of Isparql: A virtual triple approach for similarity-based semantic Web tasks. In: The Semantic Web, pp. 295–309. Springer (2007)
Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for Sparql. In: 2012 IEEE 28Th International Conference on Data Engineering, pp. 666–677. IEEE (2012)
Lehmann, J., Bühmann, L.: Autosparql: Let users query your knowledge base. In: Extended Semantic Web Conference, pp. 63–79. Springer (2011)
Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R news 2(3), 18–22 (2002)
Google Scholar
Lorey, J., Naumann, F.: Detecting Sparql query templates for data prefetching. In: Extended Semantic Web Conference, pp. 124–139. Springer (2013)
Molnar, C.: Interpretable Machine Learning. Lulu com (2020)
Morsey, M., Lehmann, J., Auer, S., Ngomo, A. C. N.: Dbpedia sparql benchmark–performance assessment with real queries on real data. In: International Semantic Web Conference, pp. 454–469. Springer (2011)
Morsey, M., Lehmann, J., Auer, S., Ngomo, A. C. N.: Usage-Centric Benchmarking of Rdf triple stores. In: AAAI. Citeseer (2012)
Peake, G., Wang, J.: Explanation mining: Post Hoc interpretability of latent factor models for recommendation systems. In: The 24Th ACM SIGKDD International Conference (2018)
Quinlan, J. R.: Induction on decision tree. Mach. Learn. 1 (1986)
Raghuveer, A.: Characterizing machine agent behavior through sparql query mining. In: Proceedings of the International Workshop on Usage Analysis and the Web of Data, Lyon, France (2012)
Reddy, B. K., Kumar, P. S.: Efficient approximate sparql querying of Web of linked data. URSW 654, 37–48 (2010)
Google Scholar
Ribeiro, M. T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: Explaining the predictions of any classifier. In: The 22Nd ACM SIGKDD International Conference (2016)
Saleem, M., Ali, M. I., Hogan, A., Mehmood, Q., Ngomo, A. C. N.: Lsq: the Linked Sparql Queries Dataset. In: International Semantic Web Conference, pp. 261–269. Springer (2015)
Saleem, M., Szárnyas, G., Conrads, F., Bukhari, S. A. C., Mehmood, Q., Ngonga Ngomo, A. C.: How Representative is a Sparql Benchmark? an Analysis of Rdf Triplestore Benchmarks. Thewebconf, pp. 1623–1633 (2019)
Seber, G. A., Lee, A. J.: Linear regression analysis. vol. 329 John Wiley & Sons (2012)
Smola, A. J., Schölkopf, B.: A tutorial on support vector regression. Statist. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Torre-Bastida, A. I., Bermúdez, J., Illarramendi, A.: Estimating query rewriting quality over lod. Semantic Web 10(3), 529–554 (2019)
Article Google Scholar
Wang, C., Zhang, X.: Q-bert: a bert-based framework for computing sparql similarity in natural language. In: Companion Proceedings of the Web Conference 2020, pp. 65–66 (2020)
Wang, M., Wang, R., Liu, J., Chen, Y., Zhang, L., Qi, G.: Towards Empty Answers in Sparql: Approximating Querying with Rdf Embedding. In: International Semantic Web Conference, pp. 513–529. Springer (2018)
Zeng, J., Ustun, B., Rudin, C.: Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society)ss (2017)
Zhang, W. E., Sheng, Q. Z., Qin, Y., Yao, L., Shemshadi, A., Taylor, K.: Secf: Improving sparql querying performance with proactive fetching and caching. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 362–367 (2016)
Zhang, X., Wang, M., Saleem, M., Ngomo, A. C. N., Qi, G., Wang, H.: Revealing secrets in sparql session level. arXiv:2009.06625 (2020)
Zheng, W., Zou, L., Peng, W., Yan, X., Song, S., Zhao, D.: Semantic sparql similarity search over rdf knowledge graphs. Proc. VLDB Endow. 9(11), 840–851 (2016)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China with Grant No. 61906037; CCF-Tencent Open Fund; CCF-Baidu Open Fund with No. CCF BAIDU OF2020003.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, China
Meng Wang & Kefei Chen
Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing, China
Meng Wang
Institute of Systems Engineering, AMS, Beijing, China
Gang Xiao
Southeast University, Nanjing, China
Xinyue Zhang
University of Technology Sydney, Ultimo, Australia
Hongxu Chen
The University of Queensland, Brisbane, Australia
Sen Wang

Authors

Meng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kefei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Xinyue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongxu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Explainability in the Web

Guest Editors: Guandong Xu, Hongzhi Yin, Irwin King, and Lin Li

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Chen, K., Xiao, G. et al. Explaining similarity for SPARQL queries. World Wide Web 24, 1813–1835 (2021). https://doi.org/10.1007/s11280-021-00886-3

Download citation

Received: 30 October 2020
Revised: 26 February 2021
Accepted: 26 April 2021
Published: 12 June 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11280-021-00886-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explaining similarity for SPARQL queries

Abstract

Access this article

Similar content being viewed by others

Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases

Interactive SPARQL query formulation using provenance

On the Marriage of SPARQL and Keywords

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Explaining similarity for SPARQL queries

Abstract

Access this article

Similar content being viewed by others

Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases

Interactive SPARQL query formulation using provenance

On the Marriage of SPARQL and Keywords

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation