Scalable SAPRQL Querying Processing on Large RDF Data in Cloud Computing Environment

Wu, Buwen; Jin, Hai; Yuan, Pingpeng

doi:10.1007/978-3-642-37015-1_55

Buwen Wu¹⁹,
Hai Jin¹⁹ &
Pingpeng Yuan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 7719))

Included in the following conference series:

Joint International Conference on Pervasive Computing and the Networked World

3957 Accesses
3 Citations

Abstract

Recently the flexibility of RDF data model makes increasing number of organizations and communities keep their data available in the RDF format. There is a growing need for querying these data in scalable and efficient way. MapReduce is a parallel data processing solution for processing large data-intensive workloads, which is not supported directly for join-intensive workloads. In this paper, we present a schema based hybrid partitioning technique for RDF triples placement according to the relationships between them, and reduce the necessary number of MR cycles in each SAPRQL query job. Then we propose a lightweight sideways information passing techniques which pass the join information across MR jobs to decrease the intermediate results involved in join operations. The experimental results show that our approaches achieve a substantial performance improvement, and outperform the previous system by a factor of 2-20 using LUBM benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The Friend of a Friend (FOAF) project, http://www.foaf-project.org/
Linking open data on the Semantic Web, http://www.w3.org/wiki/SweoIG/Task-Forces/CommunityProjects/LinkingOpenData
MapReduce, A.: major step backwards, http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/
MonetDB, http://www.monetdb.org/
Resource Description Framework (RDF), http://www.w3.org/TR/rdf-concepts/
SPARQL query language for RDF, http://www.w3.org/TR/rdf-sparql-query/
The universal protein resource (Uniprot), http://www.uniprot.org/
Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proc. VLDB, pp. 411–422 (2007)
Google Scholar
Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. The VLDB Journal 18(2), 385–406 (2009)
Article Google Scholar
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In: Proc. VLDB, pp. 922–933 (2009)
Google Scholar
Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proc. SIGMOD, pp. 359–370 (2004)
Google Scholar
Atre, M., Chaoji, V., Zaki, M., Hendler, J.: Matrix Bit loaded: a scalable lightweight join query processor for RDF data. In: Proc. WWW, pp. 41–50 (2010)
Google Scholar
Ceri, S., Navathe, S., Wiederhold, G.: Distribution design of logical database schemas. IEEE Transactions on Software Engineering (4), 487–504 (1983)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Erling, O., Mikhailov, I.: Towards web scale RDF. In: Proc. SSWS (2008)
Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3(2), 158–182 (2005)
Article Google Scholar
Harris, S., Lamb, N., Shadbolt, N.: 4store: The design and implementation of a clustered RDF store. In: Proc. SSWS, pp. 94–109 (2009)
Google Scholar
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)
Chapter Google Scholar
Huang, J., Abadi, D., Ren, K.: Scalable sparql querying of large rdf graphs. In: Proc. VLDB (2011)
Google Scholar
Husain, M., McGlothlin, J., Masud, M., Khan, L., Thuraisingham, B.: Heuristics based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering 23(9), 1312–1327 (2011)
Article Google Scholar
Ives, Z., Taylor, N.: Sideways information passing for push-style query processing. In: Proc. ICDE, pp. 774–783 (2008)
Google Scholar
Kaoudi, Z., Kyzirakos, K., Koubarakis, M.: SPARQL Query Optimization on Top of DHTs. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 418–435. Springer, Heidelberg (2010)
Chapter Google Scholar
Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: Proc. SIGMOD, pp. 627–640 (2009)
Google Scholar
Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19(1), 91–113 (2010)
Article Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proc. SIGMOD, pp. 1099–1110 (2008)
Google Scholar
Ravindra, P., Hong, S., Kim, H., Anyanwu, K.: Efficient processing of rdf graph pattern matching on mapreduce platforms. In: Proc. International Workshop on Data Intensive Computing in the Clouds, pp. 13–20 (2011)
Google Scholar
Rohloff, K., Schantz, R.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: Proc. Programming Support Innovations for Emerging Distributed Applications (2010)
Google Scholar
Sridhar, R., Ravindra, P., Anyanwu, K.: RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 715–730. Springer, Heidelberg (2009)
Chapter Google Scholar
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proc. WWW (2008)
Google Scholar
Tanimura, Y., Matono, A., Lynden, S., Kojima, I.: Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop. In: Proc. IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 251–256 (2010)
Google Scholar
Thusoo, A., Sarma, J., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: Proc. ICDE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Buwen Wu, Hai Jin & Pingpeng Yuan

Authors

Buwen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Pingpeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wuhan University of Technology, Heping Road 1178, Wuchang District, 430081, Wuhan, Hubei, China
Qiaohong Zu
Hayes Park Central, Fujitsu Laboratories of Europe Ltd., Hayes End Road, UB4 8FE, Hayes, Middlesex, UK
Bo Hu
Department of Electrical and Electronics Engineering, Aksaray University, Merkez Kampüsü, 68100, Aksaray, Turkey
Atilla Elçi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, B., Jin, H., Yuan, P. (2013). Scalable SAPRQL Querying Processing on Large RDF Data in Cloud Computing Environment. In: Zu, Q., Hu, B., Elçi, A. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2012. Lecture Notes in Computer Science, vol 7719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37015-1_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-37015-1_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37014-4
Online ISBN: 978-3-642-37015-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics