Skip to main content
Log in

Automatically Creating Benchmarks for RDF Keyword Search Evaluation

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Keyword search systems provide users with a friendly alternative to access Resource Description Framework (RDF) datasets. Evaluating such systems requires adequate benchmarks, consisting of RDF datasets, keyword queries, and correct answers. However, available benchmarks often have small sets of queries and incomplete sets of answers, mainly because they are manually constructed with the help of experts. The central contribution of this article is an offline method to build benchmarks automatically, allowing larger sets of queries and more complete answers. The proposed method has two parts: query generation and answer generation. Query generation extracts keywords for each entity from a selected set of relevant entities, called inducers, and heuristics guide the process of extracting possible keywords related to each inducer. Answer generation takes the queries and computes solution generators (SG), which are subgraphs of the original dataset containing different answers to a query. Heuristics also guide the process by building SGs only for the relevant answers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Benchmark available at https://dataverse.lib.virginia.edu/file.xhtml?fileId=1166&version=1.0

  2. http://qald.aksw.org

  3. http://qald.aksw.org

  4. http://www.dbis.informatik.uni-goettingen.de/Mondial/

References

  1. Balog K, Neumayer R (2013) A test collection for entity search in DBpedia. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, pp 737–740, doi:10.1145/2484028.2484165

  2. Bast H, Buchhold B, Haussmann E. Semantic Search on Text and Knowledge Bases. Found Trends Inf Retr. 2016;10(1):119–271. https://doi.org/10.1561/1500000032.

    Article  Google Scholar 

  3. Batista Neves A, André Paes Leme LP, Torres Izquierdo Y, Antonio Casanova M (2021) Automatic Construction of Benchmarks for RDF Keyword Search Systems Evaluation. In: Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS’21), https://doi.org/10.5220/0010519401260137

  4. Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using BANKS. In: Proceedings of the 18th International Conference on Data Engineering (ICDE’02), IEEE Comput. Soc, pp 431–440, https://doi.org/10.1109/ICDE.2002.994756

  5. Bizer C, Schultz A. The Berlin SPARQL Benchmark. Int J Semant Web Inf Syst. 2009;5(2):1–24. https://doi.org/10.4018/jswis.2009040101.

    Article  Google Scholar 

  6. Coffman J, Weaver AC (2010) A framework for evaluating database keyword search strategies. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10), ACM Press, New York, New York, USA, p 729, https://doi.org/10.1145/1871437.1871531

  7. Dosso D, Silvello G. Search Text to Retrieve Graphs: A Scalable RDF Keyword-Based Search System. IEEE Access. 2020;8:14089–111. https://doi.org/10.1109/ACCESS.2020.2966823.

    Article  Google Scholar 

  8. Dourado MC, de Oliveira RA, Protti F. Generating all the Steiner trees and computing Steiner intervals for a fixed number of terminals. Electron Notes Discrete Math. 2009;35:323–8. https://doi.org/10.1016/j.endm.2009.11.053.

    Article  MathSciNet  MATH  Google Scholar 

  9. Dubey M, Banerjee D, Abdelkawi A, Lehmann J (2019) LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia. In: Proceedings of the 18th International Semantic Web Conference (ISWC’19), pp 69–78, https://doi.org/10.1007/978-3-030-30796-7_5

  10. Elbassuoni S, Blanco R (2011) Keyword search over RDF graphs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11), ACM Press, New York, New York, USA, p 237, https://doi.org/10.1145/2063576.2063615

  11. García GM, Izquierdo YT, Menendez ES, Dartayre F, Casanova MA (2017) RDF Keyword-based Query Technology Meets a Real-World Dataset. In: Proceedings of the 20th International Conference on Database Theory (ICDT’17), pp 656–667, https://doi.org/10.5441/002/edbt.2017.86

  12. Guo Y, Pan Z, Heflin J. LUBM: A Benchmark for OWL Knowledge Base Systems. J Web Semant. 2005;3(2–3):158–82. https://doi.org/10.1016/j.websem.2005.06.005.

    Article  Google Scholar 

  13. Han S, Zou L, Yu JX, Zhao D (2017) Keyword Search on RDF Graphs - A Query Graph Assembly Approach. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge (CIKM’17), ACM, New York, NY, USA, pp 227–236, https://doi.org/10.1145/3132847.3132957

  14. Hristidis V, Papakonstantinou Y (2002) Discover: Keyword Search in Relational Databases. In: Proceedings of the 28th International Conference on Very Large Databases (VLDB’02), Elsevier, pp 670–681, https://doi.org/10.1016/B978-155860869-6/50065-2

  15. Izquierdo YT, García GM, Menendez ES, Casanova MA, Dartayre F, Levy CH (2018) QUIOW: A Keyword-Based Query Processing Tool for RDF Datasets and Relational Databases. In: Proceedings of the 30th International Conference on Database and Expert Systems Applications (DEXA’18), vol 11030 LNCS, pp 259–269, https://doi.org/10.1007/978-3-319-98812-2_22

  16. Kimelfeld B, Sagiv Y. Efficiently enumerating results of keyword search over data graphs. Inf Syst. 2008;33(4–5):335–59. https://doi.org/10.1016/j.is.2008.01.002.

    Article  MATH  Google Scholar 

  17. Le W, Li F, Kementsietsidis A, Duan S. Scalable Keyword Search on Large RDF Data. IEEE Trans Knowl Data Eng. 2014;26(11):2774–88. https://doi.org/10.1109/TKDE.2014.2302294.

    Article  Google Scholar 

  18. Lin XQ, Ma ZM, Yan L. RDF keyword search using a type-based summary. J Inf Sci Eng. 2018;34(2):489–504. https://doi.org/10.6688/JISE.201803_34(2).0011.

    Article  Google Scholar 

  19. Menendez ES, Casanova MA, Paes Leme LAP, Boughanem M (2019) Novel Node Importance Measures to Improve Keyword Search over RDF Graphs. In: Proceedings of the 31st International Conference on Database and Expert Systems Applications (DEXA’19), vol 11707, pp 143–158, https://doi.org/10.1007/978-3-030-27618-8_11

  20. Minack E, Siberski W, Nejdl W (2009) Benchmarking Fulltext Search Performance of RDF Stores. In: Proceedings of the 6th European Semantic Web Conference (ESWC’09), vol 5554 LNCS, pp 81–95, https://doi.org/10.1007/978-3-642-02121-3_10

  21. Nunes BP, Herrera J, Taibi D, Lopes GR, Casanova MA, Dietze S (2014) SCS Connector - Quantifying and Visualising Semantic Paths Between Entity Pairs. In: Proceedings of the Satellite Events of the 11th European Semantic Web Conference (ESWC’14), pp 461–466, https://doi.org/10.1007/978-3-319-11955-7_67

  22. Oliveira PSd, Da Silva A, Moura E, De Freitas R (2020) Efficient Match-Based Candidate Network Generation for Keyword Queries over Relational Databases. IEEE Transactions on Knowledge and Data Engineering pp 1–1, https://doi.org/10.1109/TKDE.2020.2998046

  23. Oliveira Filho AdC. Benchmark para métodos de consultas por palavras-chave a bancos de dados relacionais. Tech. rep.: Universidade Federal de Goiás, Goiás; 2018.

  24. Pound J, Mika P, Zaragoza H (2010) Ad-hoc Object Retrieval in the Web of Data. In: Proceedings of the 19th International Conference on World Wide Web, pp 771–780

  25. Rihany M, Kedad Z, Lopes S (2018) Keyword search over RDF graphs using wordnet. In: Proceedings of the 1st International Conference on Big Data and Cyber-Security Intelligence (BDCSIntell’18), vol 2343, pp 75–82

  26. Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data. In: Proceedings of the 25th International Conference on Data Engineering (ICDE’09), IEEE, pp 405–416, https://doi.org/10.1109/ICDE.2009.119

  27. Trivedi P, Maheshwari G, Dubey M, Lehmann J (2017) LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs. In: Proceedings of the 16th International Semantic Web Conference (ISWC’17), pp 210–218, https://doi.org/10.1007/978-3-319-68204-4_22

  28. Wen Y, Jin Y, Yuan X (2018) KAT: Keywords-to-SPARQL Translation Over RDF Graphs. In: Proceedings of the 23rd International Conference on Database Systems for Advanced Applications (DASFAA’18), vol 10827 LNCS, pp 802–810, https://doi.org/10.1007/978-3-319-91452-7_51

  29. Zenz G, Zhou X, Minack E, Siberski W, Nejdl W. From keywords to semantic queries-Incremental query construction on the semantic web. J Web Semant. 2009;7(3):166–76. https://doi.org/10.1016/j.websem.2009.07.005.

    Article  Google Scholar 

  30. Zheng W, Zou L, Peng W, Yan X, Song S, Zhao D (2016) Semantic SPARQL similarity search over RDF knowledge graphs. In: Proceedings of the 42nd VLDB (VLDB’16), vol 9, pp 840–851, https://doi.org/10.14778/2983200.2983201

  31. Zhou Q, Wang C, Xiong M, Wang H, Yu Y (2007) SPARK: Adapting Keyword Query to Semantic Search. In: Proceedings of the 6th International Semantic Web Conference (ISWC’07), Busan, Korea, vol 4825 LNCS, pp 694–707, https://doi.org/10.1007/978-3-540-76298-0_50

Download references

Acknowledgements

This work was partly funded by FAPERJ under grant E-26/202.818/2017; by CAPES under grants 88881.310592-2018/01, 88881.134081/2016-01, and 88882.164913/2010-01 and by CNPq under grant 302303/2017-0. We are grateful to João Guilherme Alves Martinez for helping with the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angelo B. Neves.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Enterprise Information Systems” guest edited by Michal Smialek, Slimane Hammoudi, Alexander Brodsky and Joaquim Filipe.

Appendices

Appendix A. Queries for \(\mathcal{K}_{\textit{Coff-Q43}}=\{{\hbox{``}}mauritius{\hbox{''}},{\hbox{``}}india{\hbox{''}}\}\)

Answer graph pattern

Query for retrieving property values

Query to generate the Coffman’s ground truth

Appendix B. Queries for \(\mathcal{K}_{\textit{Coff-Q30'}}=\{{\hbox{``}}polland{\hbox{''}},{\hbox{``}}spoken{\hbox{''}},{\hbox{``}}language{\hbox{''}}\}\)

Answer graph pattern

Query for retrieving property values

Query to generate the Coffman’s ground truth

Appendix C. Queries for \(\mathcal{K}_{\textit{Coff-Q27}}=\{{\hbox{``}}nigeria{\hbox{''}},{\hbox{``}}gdp{\hbox{''}}\}\)

Answer graph pattern

Query for retrieving property values

Query to generate the Coffman’s ground truth

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Neves, A.B., Leme, L.A.P.P., Izquierdo, Y.T. et al. Automatically Creating Benchmarks for RDF Keyword Search Evaluation. SN COMPUT. SCI. 3, 312 (2022). https://doi.org/10.1007/s42979-022-01100-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01100-5

Keywords

CR Subject Classification

Navigation