ABSTRACT
There have been a number of approaches to adopt the RDF data model and the MapReduce framework for a data warehouse, as the data model is suitable for data integration and the data processing framework is good for large-scale fault-tolerant data analyses. Nevertheless, most approaches consider the data model and the framework separately. It has been difficult to create synergy because there have been only a few algorithms which connects the data model and the framework. In this paper, we offer a general and efficient MapReduce algorithm for SPARQL Basic Graph Pattern which is a set of triple patterns to be joined. In a MapReduce world, it is known that the join operation requires computationally expensive MapReduce iterations. For this reason, we minimize the number of iterations with the followings. First, we adopt traditional multi-way join into MapReduce instead of multiple individual joins. Second, by analyzing a given query, we select a good join-key to avoid unnecessary iterations. As a result, the algorithm shows good performance and scalability in terms of time and data size.
- A. Pavlo et al., A Comparison of Approaches to Large-Scale Data Analysis, In SIGMOD, 2009. Google ScholarDigital Library
- A. Thusoo et al., Hive: A Warehousing Solution over a Map-Reduce Framework, In VLDB, 2009 Google ScholarDigital Library
- C. Olston et al., Pig Latin: A Not-So-Foreign Language for Data Processing, In SIGMOD, 2008 Google ScholarDigital Library
- C. Weiss, P. Karras, and A. Bernstein, Hexastore: Sextuple Indexing for Semantic Web Data Management, In VLDB, 2008 Google ScholarDigital Library
- Hadoop, http://hadoop.apache.org/Google Scholar
- HBase, http://hadoop.apache.org/hbase/Google Scholar
- Hyunsik Choi et al., SPIDER: A System for Scalable, Parallel / Distributed Evaluation of large-scale RDF Data, In CIKM, demo paper, 2009 Google ScholarDigital Library
- J. Dean and S. Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, vol. 53, issue 1, 72--77, 2010 Google ScholarDigital Library
- J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, In OSDI, 2004 Google ScholarDigital Library
- J. Ekanayake, S. Pallickara, and G. Fox, MapReduce for Data Intensive Scientific Analyses, In proceedings of the IEEE International Conference on e-Science, 2008 Google ScholarDigital Library
- Jena, http://jena.sourceforge.net/Google Scholar
- J. Urbani et al., Scalable Distributed Reasoning Using MapReduce, In ISWC, 2009 Google ScholarDigital Library
- M. Stonebraker et al., MapReduce and Parallel DBMSs: Friends or Foes?, Communications of the ACM, vol. 53, issue 1, 64--71, 2010. Google ScholarDigital Library
- OWL Web Ontology Language Overview, http://www.w3.org/TR/owl-features/Google Scholar
- P. Mika and G. Tummarello, Web Semantics in the Clouds, IEEE Intelligent Systems, 23(5), 82--87, 2008 Google ScholarDigital Library
- Protocol Buffers, http://code.google.com/p/protobuf/Google Scholar
- Resource Description Framework(RDF): Concepts and Abstract Syntax, http://www.w3.org/TR/rdf-concepts/Google Scholar
- SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/Google Scholar
- T. Condie et al., MapReduce Online, Technical Report UCB/EECS-2009-136, 2009Google Scholar
- T. Neumann, G. Weikum, RDF-3X: A RISC-Style Engine for RDF, In VLDB, 2008 Google ScholarDigital Library
- Thrift, http://incubator.apache.org/thrift/Google Scholar
- Y. Guo, Z. Pan and J. Heflin, LUBM: A Benchmark for OWL Knowledge Base Systems, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 3, issue 2--3, 158--182, 2005 Google ScholarDigital Library
Index Terms
- SPARQL basic graph pattern processing with iterative MapReduce
Recommendations
Scalable RDF graph querying using cloud computing
With the explosion of the semantic web technologies, conventional SPARQL processing tools do not scale well for large amounts of RDF data because they are designed for use on a single-machine context. Several optimization solutions combined with cloud ...
Efficient processing of RDF graph pattern matching on MapReduce platforms
DataCloud-SC '11: Proceedings of the second international workshop on Data intensive computing in the cloudsBroadened adoption of the Linking Open Data tenets has led to a significant surge in the amount of Semantic Web data, particularly RDF data. This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic ...
Piglet: Interactive and Platform Transparent Analytics for RDF & Dynamic Data
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide WebData analytics has gained more and more focus during recent years and many data processing platforms have been developed. They all provide a powerful but often complex API that users have to learn. Furthermore, results can only be stored or printed, ...
Comments