Abstract
The ever-increasing size of data emanating from mobile devices and sensors, dictates the use of distributed systems for storing and querying these data. Typically, such data sources provide some spatio-temporal information, alongside other useful data. The RDF data model can be used to interlink and exchange data originating from heterogeneous sources in a uniform manner. For example, consider the case where vessels report their spatio-temporal position, on a regular basis, by using various surveillance systems. In this scenario, a user might be interested to know which vessels were moving in a specific area for a given temporal range. In this paper, we address the problem of efficiently storing and querying spatio-temporal RDF data in parallel. We specifically study the case of SPARQL queries with spatio-temporal constraints, by proposing the DiStRDF system, which is comprised of a Storage and a Processing Layer. The DiStRDF Storage Layer is responsible for efficiently storing large amount of historical spatio-temporal RDF data of moving objects. On top of it, we devise our DiStRDF Processing Layer, which parses a SPARQL query and produces corresponding logical and physical execution plans. We use Spark, a well-known distributed in-memory processing framework, as the underlying processing engine. Our experimental evaluation, on real data from both aviation and maritime domains, demonstrates the efficiency of our DiStRDF system, when using various spatio-temporal range constraints.
Similar content being viewed by others
Notes
A partitioner is a mechanism that determines the location (i.e. node) of each record, on the repartitioning process.
References
Abdelaziz I, Harbi R, Khayyat Z, Kalnis P (2017) A survey and experimental comparison of distributed SPARQL engines for very large RDF data. PVLDB 10 (13):2049–2060
Alarabi L, Mokbel M F, Musleh M (2017) St-hadoop: a mapreduce framework for spatio-temporal data. In: Advances in spatial and temporal databases - 15th international symposium, SSTD 2017, Arlington, VA, USA, August 21-23, 2017, Proceedings, pp 84–104
Bereta K, Smeros P, Koubarakis M (2013) Representation and querying of valid time of triples in linked geospatial data. In: The Semantic web: semantics and big data, 10th international conference, ESWC 2013, Montpellier, France, May 26-30, 2013. Proceedings, pp 259–274
Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pp 975–986. https://doi.org/10.1145/1807167.1807273
Curé O, Blin G (2014) RDF database systems: triples storage and SPARQL query processing. Elsevier
Doulkeridis C, Nørvåg K (2014) A survey of large-scale analytical query processing in mapreduce. VLDB J 23(3):355–380
Eldawy A, Mokbel M F (2015) Spatialhadoop: a mapreduce framework for spatial data. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp 1352–1363
Garbis G, Kyzirakos K, Koubarakis M (2013) Geographica: a benchmark for geospatial rdf stores (long version). In: International semantic web conference, pp 343–359. Springer
Giannousis K, Bereta K, Karalis N, Koubarakis M (2018) Distributed execution of spatial SQL queries. In: IEEE international conference on big data, big data 2018, Seattle, WA, USA, December 10-13, 2018, pp 528–533. https://doi.org/10.1109/BigData.2018.8621908
Hagedorn S, Rȧth T. (2017) Efficient spatio-temporal event processing with STARK. In: Proceedings of the 20th international conference on extending database technology, EDBT 2017, Venice, Italy, March 21-24, 2017, pp 570–573
Husain M F, Doshi P, Khan L, Thuraisingham B M (2009) Storage and retrieval of large rdf graph using hadoop and mapreduce. CloudCom 9:680–686
Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. VLDB J 24 (1):67–91
Kim H, Ravindra P, Anyanwu K (2011) From SPARQL to mapreduce: the journey using a nested triplegroup algebra. PVLDB 4(12):1426–1429
Koubarakis M, Karpathiotakis M, Kyzirakos K, Nikolaou C, Sioutis M (2012) Data models and query languages for linked geospatial data. In: Reasoning web. Semantic technologies for advanced query answering - 8th international summer school 2012, Vienna, Austria, September 3-8, 2012. Proceedings, pp. 290–328. https://doi.org/10.1007/978-3-642-33158-9_8
Koubarakis M, Kyzirakos K (2010) Modeling and querying metadata in the semantic sensor web: the model strdf and the query language stsparql. In: The Semantic web: research and applications, 7th extended semantic web conference, ESWC 2010, Heraklion, Crete, Greece, May 30 - June 3, 2010, Proceedings, Part I, pp 425–439
Kyzirakos K, Karpathiotakis M, Bereta K, Garbis G, Nikolaou C, Smeros P, Giannakopoulou S, Dogani K, Koubarakis M (2013) The spatiotemporal RDF store Strabon. In: Proceedings of SSTD, pp 496–500
Liagouris J, Mamoulis N, Bouros P, Terrovitis M (2014) An effective encoding scheme for spatial RDF data. PVLDB 7(12):1271–1282
Naacke H, Amann B, Curė O (2017) SPARQL graph pattern processing with apache spark. In: Proceedings of the 5th international workshop on graph data-management experiences & systems, GRADES@SIGMOD/PODS 2017, Chicago, IL, USA, May 14 - 19, 2017, pp 1:1–1:7
Nikitopoulos P, Vlachou A, Doulkeridis C, Vouros GA (2018) Distrdf: distributed spatio-temporal RDF queries on spark. In: Proceedings of the workshops of the EDBT/ICDT 2018 joint conference (EDBT/ICDT 2018), Vienna, Austria, March 26, 2018, pp. 125–132. http://ceur-ws.org/Vol-2083/paper-19.pdf
Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing rdf graph pattern matching on mapreduce. In: Extended semantic web conference, pp 46–61. Springer
Rohloff K, Schantz R E (2011) Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: DIDC’11, Proceedings of the 4th international workshop on data-intensive distributed computing, San Jose, CA, USA, June 8, 2011, pp 35–44
Santipantakis G M, Glenis A, Patroumpas K, Vlachou A, Doulkeridis C, Vouros G A, Pelekis N, Theodoridis Y (2018) Spartan: semantic integration of big spatio-temporal data from streaming and archival sources. Future Generation Comp Syst
Santipantakis G M, Vouros G A, Doulkeridis C, Vlachou A, Andrienko G L, Andrienko N V, Fuchs G, Garcia J M C, Martinez M G (2017) Specification of semantic trajectories supporting data transformations for analytics: the datacron ontology. In: Proceedings of the 13th international conference on semantic systems, SEMANTICS 2017, Amsterdam, The Netherlands, September 11-14, 2017, pp 17–24
Schȧtzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2015) S2X: graph-parallel querying of RDF with graphx. In: Biomedical data management and graph online querying - VLDB 2015 workshops, Big-O(Q) and DMAH, Waikoloa, HI, USA, August 31 - September 4, 2015, Revised Selected Papers, pp 155–168
Schȧtzle A, Przyjaciel-Zablocki M, Hornung T, Lausen G (2013) Pigsparql: a SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 posters & demonstrations track, Sydney, Australia, October 23, 2013, pp. 241–244
Schȧtzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2016) S2RDF: RDF querying with SPARQL on Spark. PVLDB 9(10):804–815
Shi J, Qiu Y, Minhas U F, Jiao L, Wang C, Reinwald B, Ȯzcan F (2015) Clash of the Titans: MapReduce vs. Spark for large scale data analytics. PVLDB 8(13):2110–2121
Tang M, Yu Y, Malluhi Q M, Ouzzani M, Aref W G (2016) LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9 (13):1565–1568
Vlachou A, Doulkeridis C, Glenis A, Santipantakis G M, Vouros G A (2019) Efficient spatio-temporal RDF query processing in large dynamic knowledge bases. In: Proceedings of the 34th annual ACM symposium on applied computing, SAC 2019, Limassol, Cyprus, April 08-12, 2019
Vouros G A, Vlachou A, Santipantakis G M, Doulkeridis C, Pelekis N, Georgiou H V, Theodoridis Y, Patroumpas K, Alevizos E, Artikis A, Claramunt C, Ray C, Scarlatti D, Fuchs G, Andrienko G L, Andrienko N V, Mock M, Camossi E, Jousselme A, Garcia J M C (2018) Big data analytics for time critical mobility forecasting: recent progress and research challenges. In: Proceedings of the 21th international conference on extending database technology, EDBT 2018, Vienna, Austria, March 26-29, 2018., pp 612–623
Xie D, Li F, Yao B, Li G, Zhou L, Guo M (2016) Simba: efficient in-memory spatial analytics. In: Proceedings of the 2016 international conference on management of data, SIGMOD conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pp 1071–1085
You S, Zhang J, Gruenwald L (2015) Large-scale spatial join query processing in cloud. In: 31st IEEE international conference on data engineering workshops, ICDE workshops 2015, Seoul, South Korea, April 13-17, 2015, pp 34–41. https://doi.org/10.1109/ICDEW.2015.7129541
Yu J, Wu J, Sarwat M (2015) GeoSpark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, pp 70:1–70:4
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M J, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX conference on networked systems design and implementation (NSDI), pp 2–2
Acknowledgements
This work is supported by the datAcron project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 687591.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A SPARQL queries used In experiments
SPARQL Query 1 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
SELECT * WHERE {
?n :ofMovingObject ?ves ;
:hasHeading ?heading
:hasSpeed ?speed .
}
SPARQL Query 2 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
SELECT * WHERE {
?n :ofMovingObject ?ves ;
:hasHeading ?heading ;
:hasSpeed ?speed ;
:hasWeatherCondition ?w .
}
SPARQL Query 3 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
SELECT * WHERE {
?n :ofMovingObject ?ves ;
:hasHeading ?heading ;
:hasSpeed ?speed ;
:hasWeatherCondition ?w .
:StoppedInit :occurs ?n .
}
SPARQL Query 4 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
SELECT * WHERE {
?n :ofMovingObject ?aircraft ;
:hasHeading ?heading ;
:hasAirspeed ?speed .
}
SPARQL Query 5 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
SELECT * WHERE {
?n :ofMovingObject ?aircraft ;
:hasHeading ?heading ;
:hasAirspeed ?speed ;
:hasWeatherCondition ?w .
}
SPARQL Query 6 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
SELECT * WHERE {
?n :ofMovingObject ?aircraft ;
:hasHeading ?heading ;
:hasAirspeed ?speed ;
:hasWeatherCondition ?w .
:reportedMaxTemperature ?temp .
}
SPARQL Query 7 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
Prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
SELECT * WHERE {
?n dul:hasTemporalFeature ?temporal ;
:hasAltitude ?altitude ;
:hasGeometry ?geom .
}
SPARQL Query 8 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
Prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
SELECT * WHERE {
?n dul:hasTemporalFeature ?temporal ;
:hasAltitude ?altitude ;
:hasGeometry ?geom . ?geom
:hasWKT ?wkt .
}
SPARQL Query 9 (Maritime Domain)
Prefix : <http://www.datacron-project.eu/datAcron#>
Prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
SELECT * WHERE {
?n dul:hasTemporalFeature ?temporal ;
:hasAltitude ?altitude ;
:hasGeometry ?geom . ?geom
:hasWKT ?wkt .
?traj dul:hasPart ?n .
}
Rights and permissions
About this article
Cite this article
Nikitopoulos, P., Vlachou, A., Doulkeridis, C. et al. Parallel and scalable processing of spatio-temporal RDF queries using Spark. Geoinformatica 25, 623–653 (2021). https://doi.org/10.1007/s10707-019-00371-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-019-00371-0