SARROD: SPARQL Analyzer and Reordering for Runtime Optimization on Big Data

Tripathi, Nishtha; Banerjee, Subhasis

doi:10.1007/978-3-319-13820-6_17

SARROD: SPARQL Analyzer and Reordering for Runtime Optimization on Big Data

Nishtha Tripathi¹⁷ &
Subhasis Banerjee¹⁸

Conference paper

3797 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8883))

Abstract

Resource Description Framework has been widely adopted for representing web resources and structured graph data model. Evolution of Big Data poses challenges in processing these graphs in terms of scalability as the size of the graphs may become enormously big. Tremendous growth of the data size (and in turn the graph size) will have an impounding effect on the execution times of the queries on the graph. Although big data platforms such as Hadoop can mitigate the problem, the query ordering and data flow between the constraints presents opportunities for further optimization. SPARQL, a widely used RDF query language suffers from the similar bottleneck for large graphs. There is hardly any established method to generate all equivalent reordering for a SPARQL query containing joins, outer joins, and group by aggregations. In this paper, we propose a query reordering algorithm viz., SARROD that leverages the property of the graphs that are simple to compute yet powerful for run time optimization. Experimental results show that SARROD reduces response time for SPARQL queries when executed over SHARD graph-store (triple-store) built on the Hadoop implementation of MapReduce by an order of 12% compared to non-ordered sequence.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apache Hadoop, http://hadoop.apache.org
Apache HBase, http://hbase.apache.org
Resource Description Framework(RDF), http://www.w3.org/RDF/
Simple Protocol and RDF Query Language (SPARQL), http://www.w3.org/TR/rdf-sparql-query
Meeting report: Workshop on big data and extreme-scale computing, BDEC (2013), http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/BDEC_Charleston_Workshop_Report_Final.pdf
Goel, P., Iyer, B.: Sql query optimization: Reordering for a general class of queries. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD 1996 (1996)
Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3(2-3) (2005)
Google Scholar
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11) (2011)
Google Scholar
Myung, J., Yeon, J., Lee, S.G.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (2010)
Google Scholar
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce Software Framework: The SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications (2010)
Google Scholar
Rohloff, K., Schantz, R.E.: Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: Proceedings of the Fourth International Workshop on Data-intensive Distributed Computing, New York, NY, USA (2011)
Google Scholar
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, New York, NY, USA (2010)
Google Scholar
Zou, L., Chen, L., Özsu, M.T.: Distance-join: Pattern match query in a large graph database. Proc. VLDB Endow (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

LNM Institute of Information Technology, India
Nishtha Tripathi
IIIT-Delhi, India
Subhasis Banerjee

Authors

Nishtha Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Subhasis Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International Institute of Information Technology - Bangalore, 26/C, Electronics City, Hosur Road, 560100, Bangalore, India
Srinath Srinivasa
IBM Research - India, 4 Block C, Institutional Area, Vasant Kunj, 110070, New Delhi, India
Sameep Mehta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tripathi, N., Banerjee, S. (2014). SARROD: SPARQL Analyzer and Reordering for Runtime Optimization on Big Data. In: Srinivasa, S., Mehta, S. (eds) Big Data Analytics. BDA 2014. Lecture Notes in Computer Science, vol 8883. Springer, Cham. https://doi.org/10.1007/978-3-319-13820-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-13820-6_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13819-0
Online ISBN: 978-3-319-13820-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics