research-article

SPARQL basic graph pattern processing with iterative MapReduce

Authors:
Jaeseok Myung

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Jongheum Yeon

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

,
Sang-goo Lee

Seoul National University, Seoul, Republic of Korea

Seoul National University, Seoul, Republic of Korea
View Profile

MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the CloudApril 2010Article No.: 6Pages 1–6https://doi.org/10.1145/1779599.1779605

Published:26 April 2010Publication History

MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud

Pages 1–6

ABSTRACT

There have been a number of approaches to adopt the RDF data model and the MapReduce framework for a data warehouse, as the data model is suitable for data integration and the data processing framework is good for large-scale fault-tolerant data analyses. Nevertheless, most approaches consider the data model and the framework separately. It has been difficult to create synergy because there have been only a few algorithms which connects the data model and the framework. In this paper, we offer a general and efficient MapReduce algorithm for SPARQL Basic Graph Pattern which is a set of triple patterns to be joined. In a MapReduce world, it is known that the join operation requires computationally expensive MapReduce iterations. For this reason, we minimize the number of iterations with the followings. First, we adopt traditional multi-way join into MapReduce instead of multiple individual joins. Second, by analyzing a given query, we select a good join-key to avoid unnecessary iterations. As a result, the algorithm shows good performance and scalability in terms of time and data size.

References

A. Pavlo et al., A Comparison of Approaches to Large-Scale Data Analysis, In SIGMOD, 2009. Google ScholarDigital Library
A. Thusoo et al., Hive: A Warehousing Solution over a Map-Reduce Framework, In VLDB, 2009 Google ScholarDigital Library
C. Olston et al., Pig Latin: A Not-So-Foreign Language for Data Processing, In SIGMOD, 2008 Google ScholarDigital Library
C. Weiss, P. Karras, and A. Bernstein, Hexastore: Sextuple Indexing for Semantic Web Data Management, In VLDB, 2008 Google ScholarDigital Library
Hadoop, http://hadoop.apache.org/Google Scholar
HBase, http://hadoop.apache.org/hbase/Google Scholar
Hyunsik Choi et al., SPIDER: A System for Scalable, Parallel / Distributed Evaluation of large-scale RDF Data, In CIKM, demo paper, 2009 Google ScholarDigital Library
J. Dean and S. Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, vol. 53, issue 1, 72--77, 2010 Google ScholarDigital Library
J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, In OSDI, 2004 Google ScholarDigital Library
J. Ekanayake, S. Pallickara, and G. Fox, MapReduce for Data Intensive Scientific Analyses, In proceedings of the IEEE International Conference on e-Science, 2008 Google ScholarDigital Library
Jena, http://jena.sourceforge.net/Google Scholar
J. Urbani et al., Scalable Distributed Reasoning Using MapReduce, In ISWC, 2009 Google ScholarDigital Library
M. Stonebraker et al., MapReduce and Parallel DBMSs: Friends or Foes?, Communications of the ACM, vol. 53, issue 1, 64--71, 2010. Google ScholarDigital Library
OWL Web Ontology Language Overview, http://www.w3.org/TR/owl-features/Google Scholar
P. Mika and G. Tummarello, Web Semantics in the Clouds, IEEE Intelligent Systems, 23(5), 82--87, 2008 Google ScholarDigital Library
Protocol Buffers, http://code.google.com/p/protobuf/Google Scholar
Resource Description Framework(RDF): Concepts and Abstract Syntax, http://www.w3.org/TR/rdf-concepts/Google Scholar
SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/Google Scholar
T. Condie et al., MapReduce Online, Technical Report UCB/EECS-2009-136, 2009Google Scholar
T. Neumann, G. Weikum, RDF-3X: A RISC-Style Engine for RDF, In VLDB, 2008 Google ScholarDigital Library
Thrift, http://incubator.apache.org/thrift/Google Scholar
Y. Guo, Z. Pan and J. Heflin, LUBM: A Benchmark for OWL Knowledge Base Systems, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 3, issue 2--3, 158--182, 2005 Google ScholarDigital Library

Index Terms

SPARQL basic graph pattern processing with iterative MapReduce
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
      2. Parallel and distributed DBMSs
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Scalable RDF graph querying using cloud computing

With the explosion of the semantic web technologies, conventional SPARQL processing tools do not scale well for large amounts of RDF data because they are designed for use on a single-machine context. Several optimization solutions combined with cloud ...
Read More
Efficient processing of RDF graph pattern matching on MapReduce platforms
DataCloud-SC '11: Proceedings of the second international workshop on Data intensive computing in the clouds

Broadened adoption of the Linking Open Data tenets has led to a significant surge in the amount of Semantic Web data, particularly RDF data. This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic ...
Read More
Piglet: Interactive and Platform Transparent Analytics for RDF & Dynamic Data
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

Data analytics has gained more and more focus during recent years and many data processing platforms have been developed. They all provide a powerful but often complex API that users have to learn. Furthermore, results can only be stored or printed, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
April 2010
53 pages
ISBN:9781605589916
DOI:10.1145/1779599
Conference Chairs:
Ullas Nambiar
IBM India Research Lab, New Delhi, India
,
John McPherson
IBM Almaden Research Center
,
David Konopnicki
IBM Haifa Research Lab, Israel
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MapReduce
RDF
SPARQL
basic graph pattern
cloud computing
data warehouse
query processing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 1,155
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SPARQL basic graph pattern processing with iterative MapReduce

MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable RDF graph querying using cloud computing

Efficient processing of RDF graph pattern matching on MapReduce platforms

Piglet: Interactive and Platform Transparent Analytics for RDF & Dynamic Data