Automatically Creating Benchmarks for RDF Keyword Search Evaluation

Neves, Angelo B.; Leme, Luiz André P. Paes; Izquierdo, Yenier T.; Jiménez, Javier G.; Lopes, Giseli R.; Casanova, Marco A.

doi:10.1007/s42979-022-01100-5

Automatically Creating Benchmarks for RDF Keyword Search Evaluation

Original Research
Published: 30 May 2022

Volume 3, article number 312, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

78 Accesses
Explore all metrics

Abstract

Keyword search systems provide users with a friendly alternative to access Resource Description Framework (RDF) datasets. Evaluating such systems requires adequate benchmarks, consisting of RDF datasets, keyword queries, and correct answers. However, available benchmarks often have small sets of queries and incomplete sets of answers, mainly because they are manually constructed with the help of experts. The central contribution of this article is an offline method to build benchmarks automatically, allowing larger sets of queries and more complete answers. The proposed method has two parts: query generation and answer generation. Query generation extracts keywords for each entity from a selected set of relevant entities, called inducers, and heuristics guide the process of extracting possible keywords related to each inducer. Answer generation takes the queries and computes solution generators (SG), which are subgraphs of the original dataset containing different answers to a query. Heuristics also guide the process by building SGs only for the relevant answers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Keyword Search on RDF Datasets

Keyword Search over RDF Datasets

Which Ranking for Effective Keyword Search Query over RDF Graphs?

Notes

References

Balog K, Neumayer R (2013) A test collection for entity search in DBpedia. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, pp 737–740, doi:10.1145/2484028.2484165
Bast H, Buchhold B, Haussmann E. Semantic Search on Text and Knowledge Bases. Found Trends Inf Retr. 2016;10(1):119–271. https://doi.org/10.1561/1500000032.
Article Google Scholar
Batista Neves A, André Paes Leme LP, Torres Izquierdo Y, Antonio Casanova M (2021) Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation. In: Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS’21), https://doi.org/10.5220/0010519401260137
Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using BANKS. In: Proceedings of the 18th International Conference on Data Engineering (ICDE’02), IEEE Comput. Soc, pp 431–440, https://doi.org/10.1109/ICDE.2002.994756
Bizer C, Schultz A. The Berlin SPARQL Benchmark. Int J Semant Web Inf Syst. 2009;5(2):1–24. https://doi.org/10.4018/jswis.2009040101.
Article Google Scholar
Coffman J, Weaver AC (2010) A framework for evaluating database keyword search strategies. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10), ACM Press, New York, New York, USA, p 729, https://doi.org/10.1145/1871437.1871531
Dosso D, Silvello G. Search Text to Retrieve Graphs: A Scalable RDF Keyword-Based Search System. IEEE Access. 2020;8:14089–111. https://doi.org/10.1109/ACCESS.2020.2966823.
Article Google Scholar
Dourado MC, de Oliveira RA, Protti F. Generating all the Steiner trees and computing Steiner intervals for a fixed number of terminals. Electron Notes Discrete Math. 2009;35:323–8. https://doi.org/10.1016/j.endm.2009.11.053.
Article MathSciNet MATH Google Scholar
Dubey M, Banerjee D, Abdelkawi A, Lehmann J (2019) LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia. In: Proceedings of the 18th International Semantic Web Conference (ISWC’19), pp 69–78, https://doi.org/10.1007/978-3-030-30796-7_5
Elbassuoni S, Blanco R (2011) Keyword search over RDF graphs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11), ACM Press, New York, New York, USA, p 237, https://doi.org/10.1145/2063576.2063615
García GM, Izquierdo YT, Menendez ES, Dartayre F, Casanova MA (2017) RDF Keyword-based Query Technology Meets a Real-World Dataset. In: Proceedings of the 20th International Conference on Database Theory (ICDT’17), pp 656–667, https://doi.org/10.5441/002/edbt.2017.86
Guo Y, Pan Z, Heflin J. LUBM: A Benchmark for OWL Knowledge Base Systems. J Web Semant. 2005;3(2–3):158–82. https://doi.org/10.1016/j.websem.2005.06.005.
Article Google Scholar
Han S, Zou L, Yu JX, Zhao D (2017) Keyword Search on RDF Graphs - A Query Graph Assembly Approach. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge (CIKM’17), ACM, New York, NY, USA, pp 227–236, https://doi.org/10.1145/3132847.3132957
Hristidis V, Papakonstantinou Y (2002) Discover: Keyword Search in Relational Databases. In: Proceedings of the 28th International Conference on Very Large Databases (VLDB’02), Elsevier, pp 670–681, https://doi.org/10.1016/B978-155860869-6/50065-2
Izquierdo YT, García GM, Menendez ES, Casanova MA, Dartayre F, Levy CH (2018) QUIOW: A Keyword-Based Query Processing Tool for RDF Datasets and Relational Databases. In: Proceedings of the 30th International Conference on Database and Expert Systems Applications (DEXA’18), vol 11030 LNCS, pp 259–269, https://doi.org/10.1007/978-3-319-98812-2_22
Kimelfeld B, Sagiv Y. Efficiently enumerating results of keyword search over data graphs. Inf Syst. 2008;33(4–5):335–59. https://doi.org/10.1016/j.is.2008.01.002.
Article MATH Google Scholar
Le W, Li F, Kementsietsidis A, Duan S. Scalable Keyword Search on Large RDF Data. IEEE Trans Knowl Data Eng. 2014;26(11):2774–88. https://doi.org/10.1109/TKDE.2014.2302294.
Article Google Scholar
Lin XQ, Ma ZM, Yan L. RDF keyword search using a type-based summary. J Inf Sci Eng. 2018;34(2):489–504. https://doi.org/10.6688/JISE.201803_34(2).0011.
Article Google Scholar
Menendez ES, Casanova MA, Paes Leme LAP, Boughanem M (2019) Novel Node Importance Measures to Improve Keyword Search over RDF Graphs. In: Proceedings of the 31st International Conference on Database and Expert Systems Applications (DEXA’19), vol 11707, pp 143–158, https://doi.org/10.1007/978-3-030-27618-8_11
Minack E, Siberski W, Nejdl W (2009) Benchmarking Fulltext Search Performance of RDF Stores. In: Proceedings of the 6th European Semantic Web Conference (ESWC’09), vol 5554 LNCS, pp 81–95, https://doi.org/10.1007/978-3-642-02121-3_10
Nunes BP, Herrera J, Taibi D, Lopes GR, Casanova MA, Dietze S (2014) SCS Connector - Quantifying and Visualising Semantic Paths Between Entity Pairs. In: Proceedings of the Satellite Events of the 11th European Semantic Web Conference (ESWC’14), pp 461–466, https://doi.org/10.1007/978-3-319-11955-7_67
Oliveira PSd, Da Silva A, Moura E, De Freitas R (2020) Efficient Match-Based Candidate Network Generation for Keyword Queries over Relational Databases. IEEE Transactions on Knowledge and Data Engineering pp 1–1, https://doi.org/10.1109/TKDE.2020.2998046
Oliveira Filho AdC. Benchmark para métodos de consultas por palavras-chave a bancos de dados relacionais. Tech. rep.: Universidade Federal de Goiás, Goiás; 2018.
Pound J, Mika P, Zaragoza H (2010) Ad-hoc Object Retrieval in the Web of Data. In: Proceedings of the 19th International Conference on World Wide Web, pp 771–780
Rihany M, Kedad Z, Lopes S (2018) Keyword search over RDF graphs using wordnet. In: Proceedings of the 1st International Conference on Big Data and Cyber-Security Intelligence (BDCSIntell’18), vol 2343, pp 75–82
Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data. In: Proceedings of the 25th International Conference on Data Engineering (ICDE’09), IEEE, pp 405–416, https://doi.org/10.1109/ICDE.2009.119
Trivedi P, Maheshwari G, Dubey M, Lehmann J (2017) LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs. In: Proceedings of the 16th International Semantic Web Conference (ISWC’17), pp 210–218, https://doi.org/10.1007/978-3-319-68204-4_22
Wen Y, Jin Y, Yuan X (2018) KAT: Keywords-to-SPARQL Translation Over RDF Graphs. In: Proceedings of the 23rd International Conference on Database Systems for Advanced Applications (DASFAA’18), vol 10827 LNCS, pp 802–810, https://doi.org/10.1007/978-3-319-91452-7_51
Zenz G, Zhou X, Minack E, Siberski W, Nejdl W. From keywords to semantic queries-Incremental query construction on the semantic web. J Web Semant. 2009;7(3):166–76. https://doi.org/10.1016/j.websem.2009.07.005.
Article Google Scholar
Zheng W, Zou L, Peng W, Yan X, Song S, Zhao D (2016) Semantic SPARQL similarity search over RDF knowledge graphs. In: Proceedings of the 42nd VLDB (VLDB’16), vol 9, pp 840–851, https://doi.org/10.14778/2983200.2983201
Zhou Q, Wang C, Xiong M, Wang H, Yu Y (2007) SPARK: Adapting Keyword Query to Semantic Search. In: Proceedings of the 6th International Semantic Web Conference (ISWC’07), Busan, Korea, vol 4825 LNCS, pp 694–707, https://doi.org/10.1007/978-3-540-76298-0_50

Download references

Acknowledgements

This work was partly funded by FAPERJ under grant E-26/202.818/2017; by CAPES under grants 88881.310592-2018/01, 88881.134081/2016-01, and 88882.164913/2010-01 and by CNPq under grant 302303/2017-0. We are grateful to João Guilherme Alves Martinez for helping with the experiments.

Author information

Authors and Affiliations

Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
Angelo B. Neves, Yenier T. Izquierdo, Javier G. Jiménez & Marco A. Casanova
Universidade Federal Fluminense, Niterói, RJ, Brazil
Luiz André P. Paes Leme
Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
Giseli R. Lopes

Authors

Angelo B. Neves
View author publications
You can also search for this author in PubMed Google Scholar
Luiz André P. Paes Leme
View author publications
You can also search for this author in PubMed Google Scholar
Yenier T. Izquierdo
View author publications
You can also search for this author in PubMed Google Scholar
Javier G. Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Giseli R. Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Marco A. Casanova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angelo B. Neves.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Enterprise Information Systems” guest edited by Michal Smialek, Slimane Hammoudi, Alexander Brodsky and Joaquim Filipe.

Appendices

Appendix A. Queries for \(\mathcal{K}_{\textit{Coff-Q43}}=\{{\hbox{``}}mauritius{\hbox{''}},{\hbox{``}}india{\hbox{''}}\}\)

Answer graph pattern

Query for retrieving property values

Query to generate the Coffman’s ground truth

Appendix B. Queries for \(\mathcal{K}_{\textit{Coff-Q30'}}=\{{\hbox{``}}polland{\hbox{''}},{\hbox{``}}spoken{\hbox{''}},{\hbox{``}}language{\hbox{''}}\}\)

Answer graph pattern

Query for retrieving property values

Query to generate the Coffman’s ground truth

Appendix C. Queries for \(\mathcal{K}_{\textit{Coff-Q27}}=\{{\hbox{``}}nigeria{\hbox{''}},{\hbox{``}}gdp{\hbox{''}}\}\)

Answer graph pattern

Query for retrieving property values

Query to generate the Coffman’s ground truth

Rights and permissions

Reprints and permissions

About this article

Cite this article

Neves, A.B., Leme, L.A.P.P., Izquierdo, Y.T. et al. Automatically Creating Benchmarks for RDF Keyword Search Evaluation. SN COMPUT. SCI. 3, 312 (2022). https://doi.org/10.1007/s42979-022-01100-5

Download citation

Received: 14 September 2021
Accepted: 19 March 2022
Published: 30 May 2022
DOI: https://doi.org/10.1007/s42979-022-01100-5

Keywords

CR Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatically Creating Benchmarks for RDF Keyword Search Evaluation

Abstract

Access this article