S2X: Graph-Parallel Querying of RDF with GraphX

Schätzle, Alexander; Przyjaciel-Zablocki, Martin; Berberich, Thorsten; Lausen, Georg

doi:10.1007/978-3-319-41576-5_12

S2X: Graph-Parallel Querying of RDF with GraphX

Alexander Schätzle¹⁹,
Martin Przyjaciel-Zablocki¹⁹,
Thorsten Berberich¹⁹ &
…
Georg Lausen¹⁹

Conference paper
First Online: 24 June 2016

1232 Accesses
26 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9579))

Abstract

RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system. It allows to seamlessly combine graph-parallel and data-parallel computation in a single system, an unique feature not available in other systems. In this paper we introduce S2X, a SPARQL query processor for Hadoop where we leverage this unified abstraction by implementing basic graph pattern matching of SPARQL as a graph-parallel task while other operators are implemented in a data-parallel manner. To the best of our knowledge, this is the first approach to combine graph-parallel and data-parallel computation for SPARQL querying of RDF data based on Hadoop.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Heidelberg (2014)
Google Scholar
Fard, A., Nisar, M., Ramaswamy, L., Miller, J., Saltz, M.: A distributed vertex-centric approach for pattern matching in massive graphs. In: IEEE Big Data, pp. 403–411 (2013)
Google Scholar
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX OSDI 2014, pp. 599–613 (2014)
Google Scholar
Goodman, E.L., Grunwald, D.: Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: IA3 (2014)
Google Scholar
Han, M., Daudjee, K., Ammar, K., Özsu, M.T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. PVLDB 7(12), 1047–1058 (2014)
Google Scholar
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Husain, M.F., McGlothlin, J.P., Masud, M.M., Khan, L.R., Thuraisingham, B.M.: Heuristics-based query processing for large RDF graphs using cloud computing. IEEE TKDE 23(9), 1312–1327 (2011)
Google Scholar
Manola, F., Miller, E., McBride, B.: RDF Primer (2004). http://www.w3.org/TR/rdf-primer/
Papailiou, N., Konstantinou, I., Tsoumakos, D., Karras, P., Koziris, N.: H2RDF+: High-performance distributed joins over large-scale RDF graphs. In: IEEE Big Data, pp. 255–263 (2013)
Google Scholar
Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF (2008). http://www.w3.org/TR/rdf-sparql-query/
Schätzle, A., Przyjaciel-Zablocki, M., Hornung, T., Lausen, G.: PigSPARQL: A SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 Posters & Demonstrations Track, pp. 241–244 (2013)
Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 164–179. Springer, Heidelberg (2014)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Fast and interactive analytics over hadoop data with spark. USENIX; Login 34(4), 45–51 (2012)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI, pp. 15–28 (2012)
Google Scholar
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for web scale RDF data. In: PVLDB 2013, pp. 265–276 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 051, 79110, Freiburg, Germany
Alexander Schätzle, Martin Przyjaciel-Zablocki, Thorsten Berberich & Georg Lausen

Authors

Alexander Schätzle
View author publications
You can also search for this author in PubMed Google Scholar
Martin Przyjaciel-Zablocki
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Berberich
View author publications
You can also search for this author in PubMed Google Scholar
Georg Lausen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Schätzle .

Editor information

Editors and Affiliations

Stony Brook University, Stony Brook, New York, USA
Fusheng Wang
University of Utah, Salt Lake City, Utah, USA
Gang Luo
Columbia University, New York, New York, USA
Chunhua Weng
Nanyang Technological University, Singapore, Singapore
Arijit Khan
Qatar Computing Research Institute, Doha, Qatar
Prasenjit Mitra
Google Research, New York, New York, USA
Cong Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schätzle, A., Przyjaciel-Zablocki, M., Berberich, T., Lausen, G. (2016). S2X: Graph-Parallel Querying of RDF with GraphX. In: Wang, F., Luo, G., Weng, C., Khan, A., Mitra, P., Yu, C. (eds) Biomedical Data Management and Graph Online Querying. Big-O(Q) DMAH 2015 2015. Lecture Notes in Computer Science(), vol 9579. Springer, Cham. https://doi.org/10.1007/978-3-319-41576-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-41576-5_12
Published: 24 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41575-8
Online ISBN: 978-3-319-41576-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics