skip to main content
10.1145/1779599.1779605acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmdacConference Proceedingsconference-collections
research-article

SPARQL basic graph pattern processing with iterative MapReduce

Authors Info & Claims
Published:26 April 2010Publication History

ABSTRACT

There have been a number of approaches to adopt the RDF data model and the MapReduce framework for a data warehouse, as the data model is suitable for data integration and the data processing framework is good for large-scale fault-tolerant data analyses. Nevertheless, most approaches consider the data model and the framework separately. It has been difficult to create synergy because there have been only a few algorithms which connects the data model and the framework. In this paper, we offer a general and efficient MapReduce algorithm for SPARQL Basic Graph Pattern which is a set of triple patterns to be joined. In a MapReduce world, it is known that the join operation requires computationally expensive MapReduce iterations. For this reason, we minimize the number of iterations with the followings. First, we adopt traditional multi-way join into MapReduce instead of multiple individual joins. Second, by analyzing a given query, we select a good join-key to avoid unnecessary iterations. As a result, the algorithm shows good performance and scalability in terms of time and data size.

References

  1. A. Pavlo et al., A Comparison of Approaches to Large-Scale Data Analysis, In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Thusoo et al., Hive: A Warehousing Solution over a Map-Reduce Framework, In VLDB, 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Olston et al., Pig Latin: A Not-So-Foreign Language for Data Processing, In SIGMOD, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Weiss, P. Karras, and A. Bernstein, Hexastore: Sextuple Indexing for Semantic Web Data Management, In VLDB, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hadoop, http://hadoop.apache.org/Google ScholarGoogle Scholar
  6. HBase, http://hadoop.apache.org/hbase/Google ScholarGoogle Scholar
  7. Hyunsik Choi et al., SPIDER: A System for Scalable, Parallel / Distributed Evaluation of large-scale RDF Data, In CIKM, demo paper, 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dean and S. Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, vol. 53, issue 1, 72--77, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, In OSDI, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Ekanayake, S. Pallickara, and G. Fox, MapReduce for Data Intensive Scientific Analyses, In proceedings of the IEEE International Conference on e-Science, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jena, http://jena.sourceforge.net/Google ScholarGoogle Scholar
  12. J. Urbani et al., Scalable Distributed Reasoning Using MapReduce, In ISWC, 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Stonebraker et al., MapReduce and Parallel DBMSs: Friends or Foes?, Communications of the ACM, vol. 53, issue 1, 64--71, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. OWL Web Ontology Language Overview, http://www.w3.org/TR/owl-features/Google ScholarGoogle Scholar
  15. P. Mika and G. Tummarello, Web Semantics in the Clouds, IEEE Intelligent Systems, 23(5), 82--87, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Protocol Buffers, http://code.google.com/p/protobuf/Google ScholarGoogle Scholar
  17. Resource Description Framework(RDF): Concepts and Abstract Syntax, http://www.w3.org/TR/rdf-concepts/Google ScholarGoogle Scholar
  18. SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/Google ScholarGoogle Scholar
  19. T. Condie et al., MapReduce Online, Technical Report UCB/EECS-2009-136, 2009Google ScholarGoogle Scholar
  20. T. Neumann, G. Weikum, RDF-3X: A RISC-Style Engine for RDF, In VLDB, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Thrift, http://incubator.apache.org/thrift/Google ScholarGoogle Scholar
  22. Y. Guo, Z. Pan and J. Heflin, LUBM: A Benchmark for OWL Knowledge Base Systems, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 3, issue 2--3, 158--182, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SPARQL basic graph pattern processing with iterative MapReduce

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          MDAC '10: Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
          April 2010
          53 pages
          ISBN:9781605589916
          DOI:10.1145/1779599

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 April 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader