skip to main content
10.1145/1999299.1999303acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PigSPARQL: mapping SPARQL to Pig Latin

Published: 12 June 2011 Publication History

Abstract

In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Google's MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP2Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.

References

[1]
D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable Semantic Web Data Management Using Vertical Partitioning. In Proc. VLDB, pages 411--422, 2007.
[2]
Apache. Pig Latin Reference Manual 1 & 2. http://pig.apache.org/docs/, 2010.
[3]
J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying rdf and rdf schema. In Proc. ISWC, pages 54--68. Springer, 2002.
[4]
H. Choi, J. Son, Y. Cho, M. K. Sung, and Y. D. Chung. SPIDER: A System for Scalable, Parallel/Distributed Evaluation of Large-Scale RDF Data. In CIKM, pages 2087--2088, 2009.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[6]
O. Erling and I. Mikhailov. Towards web scale RDF. In Proc. SSWS, 2008.
[7]
A. F. Gates, O. Natkovich, S. Chopra, P. Kamath, S. M. Narayanamurthy, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high-level dataflow system on top of map-reduce: the pig experience. Proc. VLDB Endow., 2:1414--1425, 2009.
[8]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, pages 29--43, 2003.
[9]
Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web, 3(2--3):158--182, 2005.
[10]
S. Harris, N. Lamb, and N. Shadbolt. 4store: The design and implementation of a clustered rdf store. In Proc. SSWS, page 81, 2009.
[11]
O. Hartig and R. Heese. The SPARQL query graph model for query optimization. The Semantic Web: Research and Applications, pages 564--578, 2007.
[12]
M. Husain, L. Khan, M. Kantarcioglu, and B. Thuraisingham. Data intensive query processing for large RDF graphs using cloud computing tools. In Proc. CLOUD, pages 1--10. IEEE, 2010.
[13]
M. Ley. DBLP Bibliography. http://www.informatik.uni-trier.de/ley/db/, 2010.
[14]
J. Lin and C. Dyer. Data-intensive text processing with MapReduce. Synthesis Lectures on Human Language Technologies, 3(1):1--177, 2010.
[15]
F. Manola, E. Miller, and B. McBride. RDF Primer. http://www.w3.org/TR/rdf-primer/, 2004.
[16]
B. McBride. Jena: Implementing the RDF Model and Syntax Specification. In SemWeb, 2001.
[17]
P. Mika and G. Tummarello. Web Semantics in the Clouds. IEEE Intelligent Systems, 23(5):82--87, 2008.
[18]
J. Myung, J. Yeon, and S. Lee. SPARQL basic graph pattern processing with iterative MapReduce. In Proc. MDAC, pages 1--6. ACM, 2010.
[19]
T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. of the VLDB Endowment, 1(1):647--659, 2008.
[20]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proc. SIGMOD, pages 1099--1110. ACM, 2008.
[21]
A. Owens, A. Seaborne, and N. Gibbins. Clustered TDB: A Clustered Triple Store for Jena. 2008.
[22]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proc. SIGMOD, pages 165--178. ACM, 2009.
[23]
J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Transactions on Database Systems (TODS), 34(3):1--45, 2009.
[24]
E. Prud'hommeaux and A. Seaborne. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/, 2006.
[25]
P. Ravindra, V. Deshpande, and K. Anyanwu. Towards scalable RDF graph analytics on MapReduce. In Proc. MDAC, pages 1--6. ACM, 2010.
[26]
A. Schätzle, M. Przyjaciel-Zablocki, T. Hornung, and G. Lausen. PigSPARQL: Übersetzung von SPARQL nach PigLatin. In Proc. BTW, pages 65--84, 2011.
[27]
M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel. SP2Bench: A SPARQL Performance Benchmark. In Proc. ICDE, pages 222--233, 2009.
[28]
M. Schmidt, M. Meier, and G. Lausen. Foundations of SPARQL query optimization. In Proc. ICDT, pages 4--33, 2010.
[29]
M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In Proc. WWW, pages 595--604. ACM, 2008.

Cited By

View all
  • (2024)smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphsSemantic Web10.3233/SW-24357115:5(1791-1835)Online publication date: 9-Oct-2024
  • (2023)Joint Inference of Diffusion and Structure in Partially Observed Social Networks Using Coupled Matrix FactorizationACM Transactions on Knowledge Discovery from Data10.1145/359923717:9(1-28)Online publication date: 18-Jul-2023
  • (2023)An Information Theory Based Method for Quantifying the Predictability of Human MobilityACM Transactions on Knowledge Discovery from Data10.1145/359750017:9(1-19)Online publication date: 18-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SWIM '11: Proceedings of the International Workshop on Semantic Web Information Management
June 2011
61 pages
ISBN:9781450306515
DOI:10.1145/1999299
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)smart-KG: Partition-Based Linked Data Fragments for querying knowledge graphsSemantic Web10.3233/SW-24357115:5(1791-1835)Online publication date: 9-Oct-2024
  • (2023)Joint Inference of Diffusion and Structure in Partially Observed Social Networks Using Coupled Matrix FactorizationACM Transactions on Knowledge Discovery from Data10.1145/359923717:9(1-28)Online publication date: 18-Jul-2023
  • (2023)An Information Theory Based Method for Quantifying the Predictability of Human MobilityACM Transactions on Knowledge Discovery from Data10.1145/359750017:9(1-19)Online publication date: 18-Jul-2023
  • (2022)Storage and Query Processing Architectures for RDF DataEncyclopedia of Data Science and Machine Learning10.4018/978-1-7998-9220-5.ch019(298-313)Online publication date: 14-Oct-2022
  • (2022)A SPARQL benchmark for distributed databases in IoT environmentsProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3530050.3532929(1-6)Online publication date: 12-Jun-2022
  • (2021)Categorization of RDF Data Management SystemsAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0602256:2(221-233)Online publication date: Mar-2021
  • (2021)MuSe: a multi-level storage scheme for big RDF data using MapReduceJournal of Big Data10.1186/s40537-021-00519-68:1Online publication date: 9-Oct-2021
  • (2021)Grace: An Efficient Parallel SPARQL Query System over Large-Scale RDF Data2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD49262.2021.9437674(769-774)Online publication date: 5-May-2021
  • (2021)Efficient distributed path computation on RDF knowledge graphs using partial evaluationWorld Wide Web10.1007/s11280-021-00965-5Online publication date: 4-Nov-2021
  • (2021)A survey of RDF stores & SPARQL engines for querying knowledge graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00711-331:3(1-26)Online publication date: 13-Nov-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media