Skip to main content
Log in

SRX: efficient management of spatial RDF data

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We present a general encoding scheme for the efficient management of spatial RDF data. The scheme approximates the geometries of the RDF entities inside their (integer) IDs and can be used, along with several operators and optimizations we introduce, to accelerate queries with spatial predicates and to re-encode entities dynamically in case of updates. We implement our ideas in SRX, a system built on top of the popular RDF-3X system. SRX extends RDF-3X with support for three types of spatial queries: range selections (e.g., find entities within a given polygon), spatial joins (e.g., find pairs of entities whose locations are close to each other), and spatial k-nearest neighbors (e.g., find the three closest entities from a given location). We evaluate SRX on spatial queries and updates with real RDF data, and we also compare its performance with the latest versions of three popular RDF stores. The results show SRX ’s superior performance over the competitors; compared to RDF-3X, SRX improves its performance for queries with spatial predicates while incurring little overhead during updates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. For entities that have point geometries, the spatial selection can be evaluated using only the R-tree. If the entities have non-point geometries, the R-tree search may result in false positives; thus, the final results of the spatial filter are confirmed by retrieving the exact geometries from the dictionary.

  2. If the spatial join inputs are very small, we simply fetch the geometries of the input entity sets and do a nested-loop spatial join.

  3. Most spatial predicates, when translated to the grid-based approximations of the encoding, involve distance computations and/or cheap geometry intersection tests.

  4. Recall that the inputs are sorted by ID and that entities may be encoded at different granularities due to data skew or geometry extents. Therefore, using the cell ID of e\(_r\) alone is not sufficient and we have to use the minChildID of e\(_r\).

  5. The fact that the entities arrive from the inputs sorted by their IDs guarantees that they are also sorted based on their minChildIDs.

  6. Recall that the actual geometries of the entities have not been retrieved yet; otherwise, SHJ [19] would be used (see Sect. 4).

  7. In case there are no spatial entities in the database falling in \(c_p\) or one of its parent cells, then as limit we use the first free (i.e., the minimum) spatial ID for an entity in \(c_p\).

  8. https://tinyurl.com/yc4lxqdv.

  9. https://tinyurl.com/y7ukhge3.

  10. We only included a small separate cache of 40 KB for the R-tree. Since the OS caches R-tree pages, we used a small cache size in order to reduce the effect of double caching by the SaIL library.

  11. https://tinyurl.com/ydbscsxf.

  12. https://tinyurl.com/y7ukhge3.

  13. This check was not included in the version of RDF-3X we had but we added it for consistency.

References

  1. Abadi, D. J., Marcus, A., Madden, S., Hollenbach, K.J.: Scalable semantic web data management using vertical partitioning. In: VLDB (2007)

  2. Aberger, C.R., Tu, S., Olukotun, K., Ré, C.: Emptyheaded: a relational engine for graph processing. In SIGMOD (2016)

  3. Aberger, C.R., Tu, S., Olukotun, K., Ré, C.: Old techniques for new join algorithms: a case study in RDF processing. In: ICDE Workshops (2016)

  4. Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: WWW (2010)

  5. Battle, R., Kolas, D.: Enabling the geospatial semantic web with parliament and geosparql. Semant. Web 3(4), 355–370 (2012)

    Google Scholar 

  6. Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: SIGMOD (2013)

  7. Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient processing of spatial joins using R-trees. In: SIGMOD (1993)

    Article  Google Scholar 

  8. Brodt, A., Nicklas, D., Mitschang, B.: Deep integration of spatial query processing into native RDF triple stores. In: GIS (2010)

  9. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: An architecture for storing and querying RDF data and schema information. In: Semantics for the WWW. MIT Press (2001)

  10. Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB (2005)

  11. Eldawy, A., Mokbel, M.F.: The era of big spatial data: a survey. Found. Trends Databases 6(3–4), 163–273 (2016)

    Article  Google Scholar 

  12. GraphDB. http://graphdb.ontotext.com

  13. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD (1984)

  14. Hadjieleftheriou, M., Hoel, E.G., Tsotras, V.J.: Sail: a spatial index library for efficient application integration. GeoInformatica 9(4), 367–389 (2005)

    Article  Google Scholar 

  15. Koubarakis, M., Kyzirakos, K.: Modeling and querying metadata in the semantic sensor web: the model stRDF and the query language stSPARQL. In: ESWC (2010)

  16. Kyzirakos, K., Karpathiotakis, M., Koubarakis, M.: Strabon: A semantic geospatial DBMS. In: ISWC (2012)

    Chapter  Google Scholar 

  17. Liagouris, J., Mamoulis, N., Bouros, P., Terrovitis, M.: An effective encoding scheme for spatial RDF data. Proc. VLDB Endow. 7(12), 1271–1282 (2014)

    Article  Google Scholar 

  18. Linkedgeodata. http://linkedgeodata.org/About

  19. Lo, M.-L., Ravishankar, C.V.: Spatial hash-joins. In: SIGMOD (1996)

  20. Mamoulis, N.: Spatial Data Management. Morgan & Claypool Publishers, San Rafael (2011)

    Book  Google Scholar 

  21. Mamoulis, N., Papadias, D.: Slot index spatial join. TKDE 15(1), 211–231 (2003)

    Google Scholar 

  22. Mouratidis, K., Hadjieleftheriou, M., Papadias, D.: Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring. In: SIGMOD (2005)

  23. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE (2011)

  24. Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: SIGMOD (2009)

  25. Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)

    Article  Google Scholar 

  26. Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)

    Article  Google Scholar 

  27. Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3(1–2), 256–263 (2010)

    Article  Google Scholar 

  28. Nikitopoulos, P., Vlachou, A., Doulkeridis, C., Vouros, G.A.: DiStRDF: distributed spatio-temporal RDF queries on Spark. In: EDBT/ICDT (2018)

  29. Pandey, V., Kipf, A., Neumann, T., Kemper, A.: How good are modern spatial analytics systems? Proc. VLDB Endow. 11(11), 1661–1673 (2018)

    Article  Google Scholar 

  30. Parliament. http://parliament.semwebcentral.org

  31. Patroumpas, K., Giannopoulos, G., Athanasiou, S.: Towards geospatial semantic data management: strengths, weaknesses, and challenges ahead. In: GIS (2014)

  32. Virtuoso. http://virtuoso.openlinksw.com

  33. Wang, C.-J., Ku, W.-S., Chen, H.: Geo-store: a spatially-augmented sparql query evaluation system. In: GIS (2012)

  34. Wang, D., Zou, L., Feng, Y., Shen, X., Tian, J., Zhao, D.: S-store: an engine for large RDF graph integrating spatial information. In: DASFAA (2013)

  35. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)

    Article  Google Scholar 

  36. Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB (2003)

  37. YAGO. https://en.wikipedia.org/wiki/YAGO_(database)

  38. Yan, Y., Wang, C., Zhou, A., Qian, W., Ma, L., Pan, Y.: Efficient indices using graph partitioning in RDF triple stores. In: ICDE (2009)

  39. Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. Proc. VLDB Endow. 6(7), 517–528 (2013)

    Article  Google Scholar 

  40. Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. Proc. VLDB Endow. 6(4), 265–276 (2013)

    Article  Google Scholar 

  41. Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endow. 4(8), 482–493 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge support of this work by the project “Moving from Big Data Management to Data Science” (MIS 5002437/3) which is implemented under the Action “Reinforcement of the Research and Innovation Infrastructure,” funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund). This work is also partially supported by Grant 17253616 from Hong Kong RGC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Liagouris.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 143 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Theocharidis, K., Liagouris, J., Mamoulis, N. et al. SRX: efficient management of spatial RDF data. The VLDB Journal 28, 703–733 (2019). https://doi.org/10.1007/s00778-019-00554-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00554-z

Keywords

Navigation