Abstract
We present a general encoding scheme for the efficient management of spatial RDF data. The scheme approximates the geometries of the RDF entities inside their (integer) IDs and can be used, along with several operators and optimizations we introduce, to accelerate queries with spatial predicates and to re-encode entities dynamically in case of updates. We implement our ideas in SRX, a system built on top of the popular RDF-3X system. SRX extends RDF-3X with support for three types of spatial queries: range selections (e.g., find entities within a given polygon), spatial joins (e.g., find pairs of entities whose locations are close to each other), and spatial k-nearest neighbors (e.g., find the three closest entities from a given location). We evaluate SRX on spatial queries and updates with real RDF data, and we also compare its performance with the latest versions of three popular RDF stores. The results show SRX ’s superior performance over the competitors; compared to RDF-3X, SRX improves its performance for queries with spatial predicates while incurring little overhead during updates.
Similar content being viewed by others
Notes
For entities that have point geometries, the spatial selection can be evaluated using only the R-tree. If the entities have non-point geometries, the R-tree search may result in false positives; thus, the final results of the spatial filter are confirmed by retrieving the exact geometries from the dictionary.
If the spatial join inputs are very small, we simply fetch the geometries of the input entity sets and do a nested-loop spatial join.
Most spatial predicates, when translated to the grid-based approximations of the encoding, involve distance computations and/or cheap geometry intersection tests.
Recall that the inputs are sorted by ID and that entities may be encoded at different granularities due to data skew or geometry extents. Therefore, using the cell ID of e\(_r\) alone is not sufficient and we have to use the minChildID of e\(_r\).
The fact that the entities arrive from the inputs sorted by their IDs guarantees that they are also sorted based on their minChildIDs.
In case there are no spatial entities in the database falling in \(c_p\) or one of its parent cells, then as limit we use the first free (i.e., the minimum) spatial ID for an entity in \(c_p\).
We only included a small separate cache of 40 KB for the R-tree. Since the OS caches R-tree pages, we used a small cache size in order to reduce the effect of double caching by the SaIL library.
This check was not included in the version of RDF-3X we had but we added it for consistency.
References
Abadi, D. J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: VLDB (2007)
Aberger, C.R., Tu, S., Olukotun, K., Ré, C.: Emptyheaded: a relational engine for graph processing. In SIGMOD (2016)
Aberger, C.R., Tu, S., Olukotun, K., Ré, C.: Old techniques for new join algorithms: a case study in RDF processing. In: ICDE Workshops (2016)
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: WWW (2010)
Battle, R., Kolas, D.: Enabling the geospatial semantic web with parliament and geosparql. Semant. Web 3(4), 355–370 (2012)
Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: SIGMOD (2013)
Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using R-trees. In: SIGMOD (1993)
Brodt, A., Nicklas, D., Mitschang, B.: Deep integration of spatial query processing into native RDF triple stores. In: GIS (2010)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: An architecture for storing and querying RDF data and schema information. In: Semantics for the WWW. MIT Press (2001)
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB (2005)
Eldawy, A., Mokbel, M.F.: The era of big spatial data: a survey. Found. Trends Databases 6(3–4), 163–273 (2016)
GraphDB. http://graphdb.ontotext.com
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD (1984)
Hadjieleftheriou, M., Hoel, E.G., Tsotras, V.J.: Sail: a spatial index library for efficient application integration. GeoInformatica 9(4), 367–389 (2005)
Koubarakis, M., Kyzirakos, K.: Modeling and querying metadata in the semantic sensor web: the model stRDF and the query language stSPARQL. In: ESWC (2010)
Kyzirakos, K., Karpathiotakis, M., Koubarakis, M.: Strabon: A semantic geospatial DBMS. In: ISWC (2012)
Liagouris, J., Mamoulis, N., Bouros, P., Terrovitis, M.: An effective encoding scheme for spatial RDF data. Proc. VLDB Endow. 7(12), 1271–1282 (2014)
Linkedgeodata. http://linkedgeodata.org/About
Lo, M.-L., Ravishankar, C.V.: Spatial hash-joins. In: SIGMOD (1996)
Mamoulis, N.: Spatial Data Management. Morgan & Claypool Publishers, San Rafael (2011)
Mamoulis, N., Papadias, D.: Slot index spatial join. TKDE 15(1), 211–231 (2003)
Mouratidis, K., Hadjieleftheriou, M., Papadias, D.: Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring. In: SIGMOD (2005)
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE (2011)
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: SIGMOD (2009)
Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)
Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3(1–2), 256–263 (2010)
Nikitopoulos, P., Vlachou, A., Doulkeridis, C., Vouros, G.A.: DiStRDF: distributed spatio-temporal RDF queries on Spark. In: EDBT/ICDT (2018)
Pandey, V., Kipf, A., Neumann, T., Kemper, A.: How good are modern spatial analytics systems? Proc. VLDB Endow. 11(11), 1661–1673 (2018)
Parliament. http://parliament.semwebcentral.org
Patroumpas, K., Giannopoulos, G., Athanasiou, S.: Towards geospatial semantic data management: strengths, weaknesses, and challenges ahead. In: GIS (2014)
Virtuoso. http://virtuoso.openlinksw.com
Wang, C.-J., Ku, W.-S., Chen, H.: Geo-store: a spatially-augmented sparql query evaluation system. In: GIS (2012)
Wang, D., Zou, L., Feng, Y., Shen, X., Tian, J., Zhao, D.: S-store: an engine for large RDF graph integrating spatial information. In: DASFAA (2013)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB (2003)
Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: ICDE (2009)
Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. Proc. VLDB Endow. 6(4), 265–276 (2013)
Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endow. 4(8), 482–493 (2011)
Acknowledgements
We acknowledge support of this work by the project “Moving from Big Data Management to Data Science” (MIS 5002437/3) which is implemented under the Action “Reinforcement of the Research and Innovation Infrastructure,” funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund). This work is also partially supported by Grant 17253616 from Hong Kong RGC.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Theocharidis, K., Liagouris, J., Mamoulis, N. et al. SRX: efficient management of spatial RDF data. The VLDB Journal 28, 703–733 (2019). https://doi.org/10.1007/s00778-019-00554-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-019-00554-z