Skip to main content

Presto-RDF: SPARQL Querying over Big RDF Data

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9093))

Included in the following conference series:

Abstract

There has been a rapid increase in the amount of Resource Description Framework (RDF) data on the web. The processing of large volumes of RDF data requires an efficient storage and query-processing engine that can scale well with the volume of data. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook’s Presto is one such example. This paper proposes an architecture based on Presto, called Presto-RDF, that can be used to process big RDF data. An evaluation of performance of Presto in processing big RDF data against Apache Hive is also presented. The results of the experiments show that Presto-RDF framework has a much higher performance than Apache Hive and native RDF store - 4Store and it can be used to process big RDF data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Luo, Y., Picalausa, F., Fletcher, G.H., Hidders, J., Vansummeren, S.: Storing and indexing massive RDF datasets. In: Semantic Search Over the Web, pp. 31–60. Springer (2012)

    Google Scholar 

  2. Cudré-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., Keppmann, F.L., Miranker, D., Sequeda, J.F., Wylot, M.: NoSql databases for rdf: an empirical evaluation. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 310–325. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. RDF, S.: Efficient RDF Storage and Retrieval in Jena2 (2003)

    Google Scholar 

  4. Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: a survey. ACM SIGMOD Record 38(4), 23–28 (2010)

    Article  Google Scholar 

  5. Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proc. of the Intl. Conf. on Very Large Data Bases, pp. 411–422 (2007)

    Google Scholar 

  6. Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Presto: Interacting with petabytes of data at Facebook. https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920. (accessed: December 02, 2014)

  8. Hammoud, M., etal.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: Proc. of Intl. Conf. on Vary Large Databases (VLDB 2015)

    Google Scholar 

  9. Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., Koziris, N.: H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of SIGMOD Conference, pp. 909-912 (2014)

    Google Scholar 

  10. Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of SIGMOD Conference, pp. 289-300 (2014)

    Google Scholar 

  11. Kulkarni, P.: Distributed SPARQL query engine using MapReduce. In: Master of Science, Computer Science, School of Informatics, University of Edinburgh (2010)

    Google Scholar 

  12. Leida, M., Chu, A.: Distributed SPARQL query answering over RDF data streams. In: 2013 IEEE International Congress on Big Data (BigData Congress), pp. 369–378 (2013)

    Google Scholar 

  13. Wang, X., Tiropanis, T., Davis, H.C.: Evaluating graph traversal algorithms for distributed SPARQL query optimization. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Wu, Z., Horrocks, I., Mizoguchi, R., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 210–225. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Dutta, A.K., Theobald, M., Schenkel, R.: A Distributed In-Memory SPARQL Query Processor based on Message Passing (2012)

    Google Scholar 

  15. Harth, A., Hose, K., Schenkel, R.: Linked Data Management. In: CRC Press (2014)

    Google Scholar 

  16. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP^ 2Bench: a SPARQL performance benchmark. In: Data Engineering, ICDE 2009, pp. 222–233 (2009)

    Google Scholar 

  17. The SP2Bench SPARQL Performance Benchmark. http://dbis.informatik.uni-freiburg.de/forschung/projekte/SP2B/. (accessed: December 02, 2014)

  18. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services & Agents on WWW 3(2), 158–182 (2005)

    Article  Google Scholar 

  19. Berlin SPARQL Benchmark. http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/. (accessed: December 02, 2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mulugeta Mammo or Srividya K. Bansal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mammo, M., Bansal, S.K. (2015). Presto-RDF: SPARQL Querying over Big RDF Data. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19548-3_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19547-6

  • Online ISBN: 978-3-319-19548-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics