Skip to main content
Log in

Parallel and scalable processing of spatio-temporal RDF queries using Spark

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The ever-increasing size of data emanating from mobile devices and sensors, dictates the use of distributed systems for storing and querying these data. Typically, such data sources provide some spatio-temporal information, alongside other useful data. The RDF data model can be used to interlink and exchange data originating from heterogeneous sources in a uniform manner. For example, consider the case where vessels report their spatio-temporal position, on a regular basis, by using various surveillance systems. In this scenario, a user might be interested to know which vessels were moving in a specific area for a given temporal range. In this paper, we address the problem of efficiently storing and querying spatio-temporal RDF data in parallel. We specifically study the case of SPARQL queries with spatio-temporal constraints, by proposing the DiStRDF system, which is comprised of a Storage and a Processing Layer. The DiStRDF Storage Layer is responsible for efficiently storing large amount of historical spatio-temporal RDF data of moving objects. On top of it, we devise our DiStRDF Processing Layer, which parses a SPARQL query and produces corresponding logical and physical execution plans. We use Spark, a well-known distributed in-memory processing framework, as the underlying processing engine. Our experimental evaluation, on real data from both aviation and maritime domains, demonstrates the efficiency of our DiStRDF system, when using various spatio-temporal range constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://www.w3.org/

  2. http://www.opengeospatial.org/

  3. https://www.opengeospatial.org/standards/wkt-crs

  4. https://www.opengeospatial.org/standards/geosparql

  5. https://redis.io/

  6. https://parquet.apache.org/

  7. https://jena.apache.org/

  8. A partitioner is a mechanism that determines the location (i.e. node) of each record, on the repartitioning process.

  9. https://github.com/xetorthio/jedis

References

  1. Abdelaziz I, Harbi R, Khayyat Z, Kalnis P (2017) A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10 (13):2049–2060

    Google Scholar 

  2. Alarabi L, Mokbel M F, Musleh M (2017) St-hadoop: a mapreduce framework for spatio-temporal data. In: Advances in spatial and temporal databases - 15th international symposium, SSTD 2017, Arlington, VA, USA, August 21-23, 2017, Proceedings, pp 84–104

  3. Bereta K, Smeros P, Koubarakis M (2013) Representation and querying of valid time of triples in linked geospatial data. In: The Semantic web: semantics and big data, 10th international conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings, pp 259–274

  4. Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pp 975–986. https://doi.org/10.1145/1807167.1807273

  5. Curé O, Blin G (2014) RDF database systems: triples storage and SPARQL query processing. Elsevier

  6. Doulkeridis C, Nørvåg K (2014) A survey of large-scale analytical query processing in mapreduce. VLDB J 23(3):355–380

    Article  Google Scholar 

  7. Eldawy A, Mokbel M F (2015) Spatialhadoop: a mapreduce framework for spatial data. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp 1352–1363

  8. Garbis G, Kyzirakos K, Koubarakis M (2013) Geographica: a benchmark for geospatial rdf stores (long version). In: International semantic web conference, pp 343–359. Springer

  9. Giannousis K, Bereta K, Karalis N, Koubarakis M (2018) Distributed execution of spatial SQL queries. In: IEEE international conference on big data, big data 2018, Seattle, WA, USA, December 10-13, 2018, pp 528–533. https://doi.org/10.1109/BigData.2018.8621908

  10. Hagedorn S, Rȧth T. (2017) Efficient spatio-temporal event processing with STARK. In: Proceedings of the 20th international conference on extending database technology, EDBT 2017, Venice, Italy, March 21-24, 2017, pp 570–573

  11. Husain M F, Doshi P, Khan L, Thuraisingham B M (2009) Storage and retrieval of large rdf graph using hadoop and mapreduce. CloudCom 9:680–686

    Google Scholar 

  12. Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. VLDB J 24 (1):67–91

    Article  Google Scholar 

  13. Kim H, Ravindra P, Anyanwu K (2011) From SPARQL to mapreduce: the journey using a nested triplegroup algebra. PVLDB 4(12):1426–1429

    Google Scholar 

  14. Koubarakis M, Karpathiotakis M, Kyzirakos K, Nikolaou C, Sioutis M (2012) Data models and query languages for linked geospatial data. In: Reasoning web. Semantic technologies for advanced query answering - 8th international summer school 2012, Vienna, Austria, September 3-8, 2012. Proceedings, pp. 290–328. https://doi.org/10.1007/978-3-642-33158-9_8

  15. Koubarakis M, Kyzirakos K (2010) Modeling and querying metadata in the semantic sensor web: the model strdf and the query language stsparql. In: The Semantic web: research and applications, 7th extended semantic web conference, ESWC 2010, Heraklion, Crete, Greece, May 30 - June 3, 2010, Proceedings, Part I, pp 425–439

  16. Kyzirakos K, Karpathiotakis M, Bereta K, Garbis G, Nikolaou C, Smeros P, Giannakopoulou S, Dogani K, Koubarakis M (2013) The spatiotemporal RDF store Strabon. In: Proceedings of SSTD, pp 496–500

  17. Liagouris J, Mamoulis N, Bouros P, Terrovitis M (2014) An effective encoding scheme for spatial RDF data. PVLDB 7(12):1271–1282

    Google Scholar 

  18. Naacke H, Amann B, Curė O (2017) SPARQL graph pattern processing with apache spark. In: Proceedings of the 5th international workshop on graph data-management experiences & systems, GRADES@SIGMOD/PODS 2017, Chicago, IL, USA, May 14 - 19, 2017, pp 1:1–1:7

  19. Nikitopoulos P, Vlachou A, Doulkeridis C, Vouros GA (2018) Distrdf: distributed spatio-temporal RDF queries on spark. In: Proceedings of the workshops of the EDBT/ICDT 2018 joint conference (EDBT/ICDT 2018), Vienna, Austria, March 26, 2018, pp. 125–132. http://ceur-ws.org/Vol-2083/paper-19.pdf

  20. Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing rdf graph pattern matching on mapreduce. In: Extended semantic web conference, pp 46–61. Springer

  21. Rohloff K, Schantz R E (2011) Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: DIDC’11, Proceedings of the 4th international workshop on data-intensive distributed computing, San Jose, CA, USA, June 8, 2011, pp 35–44

  22. Santipantakis G M, Glenis A, Patroumpas K, Vlachou A, Doulkeridis C, Vouros G A, Pelekis N, Theodoridis Y (2018) Spartan: semantic integration of big spatio-temporal data from streaming and archival sources. Future Generation Comp Syst

  23. Santipantakis G M, Vouros G A, Doulkeridis C, Vlachou A, Andrienko G L, Andrienko N V, Fuchs G, Garcia J M C, Martinez M G (2017) Specification of semantic trajectories supporting data transformations for analytics: the datacron ontology. In: Proceedings of the 13th international conference on semantic systems, SEMANTICS 2017, Amsterdam, The Netherlands, September 11-14, 2017, pp 17–24

  24. Schȧtzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2015) S2X: graph-parallel querying of RDF with graphx. In: Biomedical data management and graph online querying - VLDB 2015 workshops, Big-O(Q) and DMAH, Waikoloa, HI, USA, August 31 - September 4, 2015, Revised Selected Papers, pp 155–168

  25. Schȧtzle A, Przyjaciel-Zablocki M, Hornung T, Lausen G (2013) Pigsparql: a SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 posters & demonstrations track, Sydney, Australia, October 23, 2013, pp. 241–244

  26. Schȧtzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2016) S2RDF: RDF querying with SPARQL on Spark. PVLDB 9(10):804–815

    Google Scholar 

  27. Shi J, Qiu Y, Minhas U F, Jiao L, Wang C, Reinwald B, Ȯzcan F (2015) Clash of the Titans: MapReduce vs. Spark for large scale data analytics. PVLDB 8(13):2110–2121

    Google Scholar 

  28. Tang M, Yu Y, Malluhi Q M, Ouzzani M, Aref W G (2016) LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9 (13):1565–1568

    Google Scholar 

  29. Vlachou A, Doulkeridis C, Glenis A, Santipantakis G M, Vouros G A (2019) Efficient spatio-temporal RDF query processing in large dynamic knowledge bases. In: Proceedings of the 34th annual ACM symposium on applied computing, SAC 2019, Limassol, Cyprus, April 08-12, 2019

  30. Vouros G A, Vlachou A, Santipantakis G M, Doulkeridis C, Pelekis N, Georgiou H V, Theodoridis Y, Patroumpas K, Alevizos E, Artikis A, Claramunt C, Ray C, Scarlatti D, Fuchs G, Andrienko G L, Andrienko N V, Mock M, Camossi E, Jousselme A, Garcia J M C (2018) Big data analytics for time critical mobility forecasting: recent progress and research challenges. In: Proceedings of the 21th international conference on extending database technology, EDBT 2018, Vienna, Austria, March 26-29, 2018., pp 612–623

  31. Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data, SIGMOD conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pp 1071–1085

  32. You S, Zhang J, Gruenwald L (2015) Large-scale spatial join query processing in cloud. In: 31st IEEE international conference on data engineering workshops, ICDE workshops 2015, Seoul, South Korea, April 13-17, 2015, pp 34–41. https://doi.org/10.1109/ICDEW.2015.7129541

  33. Yu J, Wu J, Sarwat M (2015) GeoSpark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, pp 70:1–70:4

  34. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M J, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX conference on networked systems design and implementation (NSDI), pp 2–2

Download references

Acknowledgements

This work is supported by the datAcron project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 687591.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Nikitopoulos.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A SPARQL queries used In experiments

SPARQL Query 1 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

SELECT * WHERE {

      ?n :ofMovingObject ?ves ;

      :hasHeading ?heading

      :hasSpeed ?speed .

}

SPARQL Query 2 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

SELECT * WHERE {

      ?n :ofMovingObject ?ves ;

      :hasHeading ?heading ;

      :hasSpeed ?speed ;

      :hasWeatherCondition ?w .

}

SPARQL Query 3 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

SELECT * WHERE {

      ?n :ofMovingObject ?ves ;

      :hasHeading ?heading ;

      :hasSpeed ?speed ;

      :hasWeatherCondition ?w .

      :StoppedInit :occurs ?n .

}

SPARQL Query 4 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

SELECT * WHERE {

      ?n :ofMovingObject ?aircraft ;

      :hasHeading ?heading ;

      :hasAirspeed ?speed .

}

SPARQL Query 5 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

SELECT * WHERE {

      ?n :ofMovingObject ?aircraft ;

      :hasHeading ?heading ;

      :hasAirspeed ?speed ;

      :hasWeatherCondition ?w .

}

SPARQL Query 6 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

SELECT * WHERE {

      ?n :ofMovingObject ?aircraft ;

      :hasHeading ?heading ;

      :hasAirspeed ?speed ;

      :hasWeatherCondition ?w .

      :reportedMaxTemperature ?temp .

}

SPARQL Query 7 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

Prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT * WHERE {

     ?n dul:hasTemporalFeature ?temporal ;

     :hasAltitude ?altitude ;

     :hasGeometry ?geom .

}

SPARQL Query 8 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

Prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT * WHERE {

     ?n dul:hasTemporalFeature ?temporal ;

     :hasAltitude ?altitude ;

     :hasGeometry ?geom . ?geom

     :hasWKT ?wkt .

}

SPARQL Query 9 (Maritime Domain)

Prefix : <http://www.datacron-project.eu/datAcron#>

Prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>

SELECT * WHERE {

     ?n dul:hasTemporalFeature ?temporal ;

     :hasAltitude ?altitude ;

     :hasGeometry ?geom . ?geom

     :hasWKT ?wkt .

     ?traj dul:hasPart ?n .

}

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nikitopoulos, P., Vlachou, A., Doulkeridis, C. et al. Parallel and scalable processing of spatio-temporal RDF queries using Spark. Geoinformatica 25, 623–653 (2021). https://doi.org/10.1007/s10707-019-00371-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-019-00371-0

Keywords

Navigation