Schema-level Index Models for Web Data Search (pp047-063)
Ansgar Scherp and Till Blume
doi:
https://doi.org/10.26421/JDI2.1-3
Abstracts: Indexing
the Web of Data offers many opportunities, in particular, to find
and explore data sources. One major design decision when indexing
the Web of Data is to find a suitable index model, i.e., how to
index and summarize data. Various efforts have been conducted to
develop specific index models for a given task. With each index
model designed, implemented, and evaluated independently, it remains
difficult to judge whether an approach generalizes well to another
task, set of queries, or dataset. In this work, we empirically
evaluate six representative index models with unique feature
combinations. Among them is a new index model incorporating
inferencing over RDFS
and \texttt{owl:sameAs}.
We implement all index models for the first time into a single,
stream-based framework. We evaluate variations of the index models
considering sub-graphs of size
$0$,
$1$,
and $2$
hops on two large, real-world
datasets. We evaluate
the quality of the indices regarding the compression ratio,
summarization ratio, and F1-score denoting the approximation quality
of the stream-based index computation. The experiments reveal huge
variations in compression ratio, summarization ratio, and
approximation quality for different index models, queries, and
datasets. However, we observe meaningful correlations in the results
that help to determine the right index model for a given task, type
of query, and dataset.
Key words:
Graph
Summarization; Schema-level Graph Indices; Data Search