Indexing Data on the Web: A Comparison of Schema-Level Indices for Data Search

Blume, Till; Scherp, Ansgar

doi:10.1007/978-3-030-59051-2_18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12392))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

794 Accesses
2 Citations

Abstract

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluate six representative index models with unique feature combinations. Among them is a new index model incorporating inferencing over RDFS and owl:sameAs. We implement all index models for the first time into a single, stream-based framework. We evaluate variations of the index models considering sub-graphs of size 0, 1, and 2 hops on two large, real-world datasets. We evaluate the quality of the indices regarding the compression ratio, summarization ratio, and F1-score denoting the approximation quality of the stream-based index computation. The experiments reveal huge variations in compression ratio, summarization ratio, and approximation quality for different index models, queries, and datasets. However, we observe meaningful correlations in the results that help to determine the right index model for a given task, type of query, and dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Benedetti, F., Bergamaschi, S., Po, L.: Exposing the underlying schema of LOD sources. In: Joint IEEE/WIC/ACM WI and IAT, pp. 301–304. IEEE (2015)
Google Scholar
Blume, T., Scherp, A.: FLuID: a meta model to flexibly define schema-level indices for the web of data. CoRR abs/1908.01528 (2019)
Google Scholar
Blume, T., Scherp, A.: Indexing data on the web: a comparison of schema-level indices for data search - extended Technical report. CoRR abs/2006.07064 (2020)
Google Scholar
Čebirić, Š., et al.: Summarizing semantic graphs: a survey. VLDB J. 28(3), 295–327 (2018). https://doi.org/10.1007/s00778-018-0528-3
Article Google Scholar
Ciglan, M., Nørvåg, K., Hluchý, L.: The SemSets model for ad-hoc semantic list search. In: WWW, pp. 131–140. ACM (2012)
Google Scholar
Goasdoué, F., Guzewicz, P., Manolescu, I.: Incremental structural summarization of RDF graphs. In: EDBT, pp. 566–569. OpenProceedings.org (2019)
Google Scholar
Gottron, T., Scherp, A., Krayer, B., Peters, A.: LODatio: using a schema-level index to support users infinding relevant sources of linked data. In: K-CAP, pp. 105–108. ACM (2013)
Google Scholar
Hose, K., Schenkel, R., Theobald, M., Weikum, G.: Database foundations for scalable RDF processing. In: Polleres, A., et al. (eds.) Reasoning Web 2011. LNCS, vol. 6848, pp. 202–249. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23032-5_4
Chapter Google Scholar
Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 213–227. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_15
Chapter Google Scholar
Konrath, M., Gottron, T., Staab, S., Scherp, A.: SchemEX - efficient construction of a data catalogue by stream-based indexing of linked data. J. Web Sem. 16, 52–58 (2012)
Article Google Scholar
Lei, Y., Uren, V., Motta, E.: SemSearch: a search engine for the semantic web. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS (LNAI), vol. 4248, pp. 238–245. Springer, Heidelberg (2006). https://doi.org/10.1007/11891451_22
Chapter Google Scholar
Mihindukulasooriya, N., Poveda-Villalón, M., García-Castro, R., Gómez-Pérez, A.: Loupe - an online tool for inspecting datasets in the linked data cloud. In: ISWC Posters & Demos, vol. 1486. CEUR-WS.org (2015)
Google Scholar
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994. IEEE (2011)
Google Scholar
Pietriga, E., et al.: Browsing linked data catalogs with LODAtlas. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 137–153. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_9
Chapter Google Scholar
Schaible, J., Gottron, T., Scherp, A.: TermPicker: enabling the reuse of vocabulary terms by exploiting data from the linked open data cloud. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 101–117. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34129-3_7
Chapter Google Scholar
Spahiu, B., Porrini, R., Palmonari, M., Rula, A., Maurino, A.: ABSTAT: ontology-driven linked data summaries with pattern minimalization. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 381–395. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_51
Chapter Google Scholar
Tran, T., Haase, P., Studer, R.: Semantic search – using graph-structured semantic models for supporting the search process. In: Rudolph, S., Dau, F., Kuznetsov, S.O. (eds.) ICCS-ConceptStruct 2009. LNCS (LNAI), vol. 5662, pp. 48–65. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03079-6_5
Chapter Google Scholar
Tran, T., Ladwig, G., Rudolph, S.: Managing structured and semi-structured RDF data using structure indexes. TKDE 25(9), 2076–2089 (2013)
Google Scholar

Download references

Acknowledgment

This research was co-financed by the EU H2020 project MOVING (http://www.moving-project.eu/) under contract no 693092.

Author information

Authors and Affiliations

Kiel University, Kiel, Germany
Till Blume
Ulm University, Ulm, Germany
Ansgar Scherp

Authors

Till Blume
View author publications
You can also search for this author in PubMed Google Scholar
Ansgar Scherp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Till Blume .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
IFS, Vienna University of Technology, Vienna, Wien, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blume, T., Scherp, A. (2020). Indexing Data on the Web: A Comparison of Schema-Level Indices for Data Search. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12392. Springer, Cham. https://doi.org/10.1007/978-3-030-59051-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-59051-2_18
Published: 08 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59050-5
Online ISBN: 978-3-030-59051-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics