Abstract
Keyword search systems provide users with a friendly alternative to access Resource Description Framework (RDF) datasets. Evaluating such systems requires adequate benchmarks, consisting of RDF datasets, keyword queries, and correct answers. However, available benchmarks often have small sets of queries and incomplete sets of answers, mainly because they are manually constructed with the help of experts. The central contribution of this article is an offline method to build benchmarks automatically, allowing larger sets of queries and more complete answers. The proposed method has two parts: query generation and answer generation. Query generation extracts keywords for each entity from a selected set of relevant entities, called inducers, and heuristics guide the process of extracting possible keywords related to each inducer. Answer generation takes the queries and computes solution generators (SG), which are subgraphs of the original dataset containing different answers to a query. Heuristics also guide the process by building SGs only for the relevant answers.






Similar content being viewed by others
References
Balog K, Neumayer R (2013) A test collection for entity search in DBpedia. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, pp 737–740, doi:10.1145/2484028.2484165
Bast H, Buchhold B, Haussmann E. Semantic Search on Text and Knowledge Bases. Found Trends Inf Retr. 2016;10(1):119–271. https://doi.org/10.1561/1500000032.
Batista Neves A, André Paes Leme LP, Torres Izquierdo Y, Antonio Casanova M (2021) Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation. In: Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS’21), https://doi.org/10.5220/0010519401260137
Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using BANKS. In: Proceedings of the 18th International Conference on Data Engineering (ICDE’02), IEEE Comput. Soc, pp 431–440, https://doi.org/10.1109/ICDE.2002.994756
Bizer C, Schultz A. The Berlin SPARQL Benchmark. Int J Semant Web Inf Syst. 2009;5(2):1–24. https://doi.org/10.4018/jswis.2009040101.
Coffman J, Weaver AC (2010) A framework for evaluating database keyword search strategies. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10), ACM Press, New York, New York, USA, p 729, https://doi.org/10.1145/1871437.1871531
Dosso D, Silvello G. Search Text to Retrieve Graphs: A Scalable RDF Keyword-Based Search System. IEEE Access. 2020;8:14089–111. https://doi.org/10.1109/ACCESS.2020.2966823.
Dourado MC, de Oliveira RA, Protti F. Generating all the Steiner trees and computing Steiner intervals for a fixed number of terminals. Electron Notes Discrete Math. 2009;35:323–8. https://doi.org/10.1016/j.endm.2009.11.053.
Dubey M, Banerjee D, Abdelkawi A, Lehmann J (2019) LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia. In: Proceedings of the 18th International Semantic Web Conference (ISWC’19), pp 69–78, https://doi.org/10.1007/978-3-030-30796-7_5
Elbassuoni S, Blanco R (2011) Keyword search over RDF graphs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11), ACM Press, New York, New York, USA, p 237, https://doi.org/10.1145/2063576.2063615
García GM, Izquierdo YT, Menendez ES, Dartayre F, Casanova MA (2017) RDF Keyword-based Query Technology Meets a Real-World Dataset. In: Proceedings of the 20th International Conference on Database Theory (ICDT’17), pp 656–667, https://doi.org/10.5441/002/edbt.2017.86
Guo Y, Pan Z, Heflin J. LUBM: A Benchmark for OWL Knowledge Base Systems. J Web Semant. 2005;3(2–3):158–82. https://doi.org/10.1016/j.websem.2005.06.005.
Han S, Zou L, Yu JX, Zhao D (2017) Keyword Search on RDF Graphs - A Query Graph Assembly Approach. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge (CIKM’17), ACM, New York, NY, USA, pp 227–236, https://doi.org/10.1145/3132847.3132957
Hristidis V, Papakonstantinou Y (2002) Discover: Keyword Search in Relational Databases. In: Proceedings of the 28th International Conference on Very Large Databases (VLDB’02), Elsevier, pp 670–681, https://doi.org/10.1016/B978-155860869-6/50065-2
Izquierdo YT, García GM, Menendez ES, Casanova MA, Dartayre F, Levy CH (2018) QUIOW: A Keyword-Based Query Processing Tool for RDF Datasets and Relational Databases. In: Proceedings of the 30th International Conference on Database and Expert Systems Applications (DEXA’18), vol 11030 LNCS, pp 259–269, https://doi.org/10.1007/978-3-319-98812-2_22
Kimelfeld B, Sagiv Y. Efficiently enumerating results of keyword search over data graphs. Inf Syst. 2008;33(4–5):335–59. https://doi.org/10.1016/j.is.2008.01.002.
Le W, Li F, Kementsietsidis A, Duan S. Scalable Keyword Search on Large RDF Data. IEEE Trans Knowl Data Eng. 2014;26(11):2774–88. https://doi.org/10.1109/TKDE.2014.2302294.
Lin XQ, Ma ZM, Yan L. RDF keyword search using a type-based summary. J Inf Sci Eng. 2018;34(2):489–504. https://doi.org/10.6688/JISE.201803_34(2).0011.
Menendez ES, Casanova MA, Paes Leme LAP, Boughanem M (2019) Novel Node Importance Measures to Improve Keyword Search over RDF Graphs. In: Proceedings of the 31st International Conference on Database and Expert Systems Applications (DEXA’19), vol 11707, pp 143–158, https://doi.org/10.1007/978-3-030-27618-8_11
Minack E, Siberski W, Nejdl W (2009) Benchmarking Fulltext Search Performance of RDF Stores. In: Proceedings of the 6th European Semantic Web Conference (ESWC’09), vol 5554 LNCS, pp 81–95, https://doi.org/10.1007/978-3-642-02121-3_10
Nunes BP, Herrera J, Taibi D, Lopes GR, Casanova MA, Dietze S (2014) SCS Connector - Quantifying and Visualising Semantic Paths Between Entity Pairs. In: Proceedings of the Satellite Events of the 11th European Semantic Web Conference (ESWC’14), pp 461–466, https://doi.org/10.1007/978-3-319-11955-7_67
Oliveira PSd, Da Silva A, Moura E, De Freitas R (2020) Efficient Match-Based Candidate Network Generation for Keyword Queries over Relational Databases. IEEE Transactions on Knowledge and Data Engineering pp 1–1, https://doi.org/10.1109/TKDE.2020.2998046
Oliveira Filho AdC. Benchmark para métodos de consultas por palavras-chave a bancos de dados relacionais. Tech. rep.: Universidade Federal de Goiás, Goiás; 2018.
Pound J, Mika P, Zaragoza H (2010) Ad-hoc Object Retrieval in the Web of Data. In: Proceedings of the 19th International Conference on World Wide Web, pp 771–780
Rihany M, Kedad Z, Lopes S (2018) Keyword search over RDF graphs using wordnet. In: Proceedings of the 1st International Conference on Big Data and Cyber-Security Intelligence (BDCSIntell’18), vol 2343, pp 75–82
Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data. In: Proceedings of the 25th International Conference on Data Engineering (ICDE’09), IEEE, pp 405–416, https://doi.org/10.1109/ICDE.2009.119
Trivedi P, Maheshwari G, Dubey M, Lehmann J (2017) LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs. In: Proceedings of the 16th International Semantic Web Conference (ISWC’17), pp 210–218, https://doi.org/10.1007/978-3-319-68204-4_22
Wen Y, Jin Y, Yuan X (2018) KAT: Keywords-to-SPARQL Translation Over RDF Graphs. In: Proceedings of the 23rd International Conference on Database Systems for Advanced Applications (DASFAA’18), vol 10827 LNCS, pp 802–810, https://doi.org/10.1007/978-3-319-91452-7_51
Zenz G, Zhou X, Minack E, Siberski W, Nejdl W. From keywords to semantic queries-Incremental query construction on the semantic web. J Web Semant. 2009;7(3):166–76. https://doi.org/10.1016/j.websem.2009.07.005.
Zheng W, Zou L, Peng W, Yan X, Song S, Zhao D (2016) Semantic SPARQL similarity search over RDF knowledge graphs. In: Proceedings of the 42nd VLDB (VLDB’16), vol 9, pp 840–851, https://doi.org/10.14778/2983200.2983201
Zhou Q, Wang C, Xiong M, Wang H, Yu Y (2007) SPARK: Adapting Keyword Query to Semantic Search. In: Proceedings of the 6th International Semantic Web Conference (ISWC’07), Busan, Korea, vol 4825 LNCS, pp 694–707, https://doi.org/10.1007/978-3-540-76298-0_50
Acknowledgements
This work was partly funded by FAPERJ under grant E-26/202.818/2017; by CAPES under grants 88881.310592-2018/01, 88881.134081/2016-01, and 88882.164913/2010-01 and by CNPq under grant 302303/2017-0. We are grateful to João Guilherme Alves Martinez for helping with the experiments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Enterprise Information Systems” guest edited by Michal Smialek, Slimane Hammoudi, Alexander Brodsky and Joaquim Filipe.
Appendices
Appendix A. Queries for \(\mathcal{K}_{\textit{Coff-Q43}}=\{{\hbox{``}}mauritius{\hbox{''}},{\hbox{``}}india{\hbox{''}}\}\)
Answer graph pattern
Query for retrieving property values
Query to generate the Coffman’s ground truth
Appendix B. Queries for \(\mathcal{K}_{\textit{Coff-Q30'}}=\{{\hbox{``}}polland{\hbox{''}},{\hbox{``}}spoken{\hbox{''}},{\hbox{``}}language{\hbox{''}}\}\)
Answer graph pattern
Query for retrieving property values
Query to generate the Coffman’s ground truth
Appendix C. Queries for \(\mathcal{K}_{\textit{Coff-Q27}}=\{{\hbox{``}}nigeria{\hbox{''}},{\hbox{``}}gdp{\hbox{''}}\}\)
Answer graph pattern
Query for retrieving property values
Query to generate the Coffman’s ground truth
Rights and permissions
About this article
Cite this article
Neves, A.B., Leme, L.A.P.P., Izquierdo, Y.T. et al. Automatically Creating Benchmarks for RDF Keyword Search Evaluation. SN COMPUT. SCI. 3, 312 (2022). https://doi.org/10.1007/s42979-022-01100-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01100-5