Skip to main content

SARROD: SPARQL Analyzer and Reordering for Runtime Optimization on Big Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8883))

Abstract

Resource Description Framework has been widely adopted for representing web resources and structured graph data model. Evolution of Big Data poses challenges in processing these graphs in terms of scalability as the size of the graphs may become enormously big. Tremendous growth of the data size (and in turn the graph size) will have an impounding effect on the execution times of the queries on the graph. Although big data platforms such as Hadoop can mitigate the problem, the query ordering and data flow between the constraints presents opportunities for further optimization. SPARQL, a widely used RDF query language suffers from the similar bottleneck for large graphs. There is hardly any established method to generate all equivalent reordering for a SPARQL query containing joins, outer joins, and group by aggregations. In this paper, we propose a query reordering algorithm viz., SARROD that leverages the property of the graphs that are simple to compute yet powerful for run time optimization. Experimental results show that SARROD reduces response time for SPARQL queries when executed over SHARD graph-store (triple-store) built on the Hadoop implementation of MapReduce by an order of 12% compared to non-ordered sequence.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apache Hadoop, http://hadoop.apache.org

  2. Apache HBase, http://hbase.apache.org

  3. Resource Description Framework(RDF), http://www.w3.org/RDF/

  4. Simple Protocol and RDF Query Language (SPARQL), http://www.w3.org/TR/rdf-sparql-query

  5. Meeting report: Workshop on big data and extreme-scale computing, BDEC (2013), http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/BDEC_Charleston_Workshop_Report_Final.pdf

  6. Goel, P., Iyer, B.: Sql query optimization: Reordering for a general class of queries. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, SIGMOD 1996 (1996)

    Google Scholar 

  7. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3(2-3) (2005)

    Google Scholar 

  8. Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11) (2011)

    Google Scholar 

  9. Myung, J., Yeon, J., Lee, S.G.: SPARQL basic graph pattern processing with iterative MapReduce. In: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud (2010)

    Google Scholar 

  10. Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce Software Framework: The SHARD triple-store. In: Programming Support Innovations for Emerging Distributed Applications (2010)

    Google Scholar 

  11. Rohloff, K., Schantz, R.E.: Clause-iteration with mapreduce to scalably query datagraphs in the SHARD graph-store. In: Proceedings of the Fourth International Workshop on Data-intensive Distributed Computing, New York, NY, USA (2011)

    Google Scholar 

  12. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, New York, NY, USA (2010)

    Google Scholar 

  13. Zou, L., Chen, L., Özsu, M.T.: Distance-join: Pattern match query in a large graph database. Proc. VLDB Endow (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tripathi, N., Banerjee, S. (2014). SARROD: SPARQL Analyzer and Reordering for Runtime Optimization on Big Data. In: Srinivasa, S., Mehta, S. (eds) Big Data Analytics. BDA 2014. Lecture Notes in Computer Science, vol 8883. Springer, Cham. https://doi.org/10.1007/978-3-319-13820-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13820-6_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13819-0

  • Online ISBN: 978-3-319-13820-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics