Skip to main content

Scalable SAPRQL Querying Processing on Large RDF Data in Cloud Computing Environment

  • Conference paper
Pervasive Computing and the Networked World (ICPCA/SWS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 7719))

Abstract

Recently the flexibility of RDF data model makes increasing number of organizations and communities keep their data available in the RDF format. There is a growing need for querying these data in scalable and efficient way. MapReduce is a parallel data processing solution for processing large data-intensive workloads, which is not supported directly for join-intensive workloads. In this paper, we present a schema based hybrid partitioning technique for RDF triples placement according to the relationships between them, and reduce the necessary number of MR cycles in each SAPRQL query job. Then we propose a lightweight sideways information passing techniques which pass the join information across MR jobs to decrease the intermediate results involved in join operations. The experimental results show that our approaches achieve a substantial performance improvement, and outperform the previous system by a factor of 2-20 using LUBM benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Friend of a Friend (FOAF) project, http://www.foaf-project.org/

  2. Linking open data on the Semantic Web, http://www.w3.org/wiki/SweoIG/Task-Forces/CommunityProjects/LinkingOpenData

  3. MapReduce, A.: major step backwards, http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/

  4. MonetDB, http://www.monetdb.org/

  5. Resource Description Framework (RDF), http://www.w3.org/TR/rdf-concepts/

  6. SPARQL query language for RDF, http://www.w3.org/TR/rdf-sparql-query/

  7. The universal protein resource (Uniprot), http://www.uniprot.org/

  8. Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proc. VLDB, pp. 411–422 (2007)

    Google Scholar 

  9. Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. The VLDB Journal 18(2), 385–406 (2009)

    Article  Google Scholar 

  10. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proc. VLDB, pp. 922–933 (2009)

    Google Scholar 

  11. Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proc. SIGMOD, pp. 359–370 (2004)

    Google Scholar 

  12. Atre, M., Chaoji, V., Zaki, M., Hendler, J.: Matrix Bit loaded: a scalable lightweight join query processor for RDF data. In: Proc. WWW, pp. 41–50 (2010)

    Google Scholar 

  13. Ceri, S., Navathe, S., Wiederhold, G.: Distribution design of logical database schemas. IEEE Transactions on Software Engineering (4), 487–504 (1983)

    Google Scholar 

  14. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  15. Erling, O., Mikhailov, I.: Towards web scale RDF. In: Proc. SSWS (2008)

    Google Scholar 

  16. Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3(2), 158–182 (2005)

    Article  Google Scholar 

  17. Harris, S., Lamb, N., Shadbolt, N.: 4store: The design and implementation of a clustered RDF store. In: Proc. SSWS, pp. 94–109 (2009)

    Google Scholar 

  18. Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Huang, J., Abadi, D., Ren, K.: Scalable sparql querying of large rdf graphs. In: Proc. VLDB (2011)

    Google Scholar 

  20. Husain, M., McGlothlin, J., Masud, M., Khan, L., Thuraisingham, B.: Heuristics based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering 23(9), 1312–1327 (2011)

    Article  Google Scholar 

  21. Ives, Z., Taylor, N.: Sideways information passing for push-style query processing. In: Proc. ICDE, pp. 774–783 (2008)

    Google Scholar 

  22. Kaoudi, Z., Kyzirakos, K., Koubarakis, M.: SPARQL Query Optimization on Top of DHTs. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 418–435. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  23. Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proc. SIGMOD, pp. 627–640 (2009)

    Google Scholar 

  24. Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19(1), 91–113 (2010)

    Article  Google Scholar 

  25. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proc. SIGMOD, pp. 1099–1110 (2008)

    Google Scholar 

  26. Ravindra, P., Hong, S., Kim, H., Anyanwu, K.: Efficient processing of rdf graph pattern matching on mapreduce platforms. In: Proc. International Workshop on Data Intensive Computing in the Clouds, pp. 13–20 (2011)

    Google Scholar 

  27. Rohloff, K., Schantz, R.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: Proc. Programming Support Innovations for Emerging Distributed Applications (2010)

    Google Scholar 

  28. Sridhar, R., Ravindra, P., Anyanwu, K.: RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 715–730. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  29. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proc. WWW (2008)

    Google Scholar 

  30. Tanimura, Y., Matono, A., Lynden, S., Kojima, I.: Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop. In: Proc. IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 251–256 (2010)

    Google Scholar 

  31. Thusoo, A., Sarma, J., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: Proc. ICDE (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, B., Jin, H., Yuan, P. (2013). Scalable SAPRQL Querying Processing on Large RDF Data in Cloud Computing Environment. In: Zu, Q., Hu, B., Elçi, A. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2012. Lecture Notes in Computer Science, vol 7719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37015-1_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37015-1_55

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37014-4

  • Online ISBN: 978-3-642-37015-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics