Abstract
Resource Description Framework has been widely adopted for representing web resources and structured graph data model. Evolution of Big Data poses challenges in processing these graphs in terms of scalability as the size of the graphs may become enormously big. Tremendous growth of the data size (and in turn the graph size) will have an impounding effect on the execution times of the queries on the graph. Although big data platforms such as Hadoop can mitigate the problem, the query ordering and data flow between the constraints presents opportunities for further optimization. SPARQL, a widely used RDF query language suffers from the similar bottleneck for large graphs. There is hardly any established method to generate all equivalent reordering for a SPARQL query containing joins, outer joins, and group by aggregations. In this paper, we propose a query reordering algorithm viz., SARROD that leverages the property of the graphs that are simple to compute yet powerful for run time optimization. Experimental results show that SARROD reduces response time for SPARQL queries when executed over SHARD graph-store (triple-store) built on the Hadoop implementation of MapReduce by an order of 12% compared to non-ordered sequence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Apache Hadoop, http://hadoop.apache.org
Apache HBase, http://hbase.apache.org
Resource Description Framework(RDF), http://www.w3.org/RDF/
Simple Protocol and RDF Query Language (SPARQL), http://www.w3.org/TR/rdf-sparql-query
Meeting report: Workshop on big data and extreme-scale computing, BDEC (2013), http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/BDEC_Charleston_Workshop_Report_Final.pdf
Goel, P., Iyer, B.: Sql query optimization: Reordering for a general class of queries. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD 1996 (1996)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3(2-3) (2005)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11) (2011)
Myung, J., Yeon, J., Lee, S.G.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (2010)
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce Software Framework: The SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications (2010)
Rohloff, K., Schantz, R.E.: Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: Proceedings of the Fourth International Workshop on Data-intensive Distributed Computing, New York, NY, USA (2011)
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, New York, NY, USA (2010)
Zou, L., Chen, L., Özsu, M.T.: Distance-join: Pattern match query in a large graph database. Proc. VLDB Endow (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tripathi, N., Banerjee, S. (2014). SARROD: SPARQL Analyzer and Reordering for Runtime Optimization on Big Data. In: Srinivasa, S., Mehta, S. (eds) Big Data Analytics. BDA 2014. Lecture Notes in Computer Science, vol 8883. Springer, Cham. https://doi.org/10.1007/978-3-319-13820-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-13820-6_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13819-0
Online ISBN: 978-3-319-13820-6
eBook Packages: Computer ScienceComputer Science (R0)