skip to main content
10.1145/2695664.2695720acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Processing billions of RDF triples on a single machine using streaming and sorting

Published:13 April 2015Publication History

ABSTRACT

We consider the feasibility of processing billions of RDF triples on a single commodity machine using streaming and sorting techniques and focusing on RDF processing tasks relevant for Linked Data consumption: data filtering and transformation, RDFS inference, owl:sameAs smushing and statistics extraction. To investigate this research question we built RDFpro (rdf processor), an open source tool that provides streaming and sorting-based processors for the considered tasks and allows their sequential and parallel composition in complex pipelines. an empirical evaluation of RDFpro in four application scenario---dataset analysis, filtering, merging and massaging---shows the effectiveness of the tool and allows to positively answer our research question.

References

  1. Infovore. https://github.com/paulhoule/infovore.Google ScholarGoogle Scholar
  2. Jena riot. https://jena.apache.org/documentation/io/.Google ScholarGoogle Scholar
  3. make-void. https://github.com/cygri/make-void.Google ScholarGoogle Scholar
  4. rapper. http://librdf.org/raptor/rapper.html.Google ScholarGoogle Scholar
  5. rdfConvert. https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/UserGuide/Tools/rdfConvert.Google ScholarGoogle Scholar
  6. rdfpro. http://fracor.bitbucket.org/rdfpro/.Google ScholarGoogle Scholar
  7. rdfpipe. http://rdfextras.readthedocs.org/en/latest/tools/rdfpipe.html.Google ScholarGoogle Scholar
  8. Sesame RDFConverter. http://sourceforge.net/projects/rdfconvert.Google ScholarGoogle Scholar
  9. G. Aggarwal, M. Datar, S. Rajagopalan, and M. Ruhl. On the streaming model augmented with a sorting primitive. In IEEE Symposium on Foundations of Computer Science (FOCS), pages 540--549, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. Describing linked datasets. In Workshop on Linked Data on the Web (LDOW), 2009.Google ScholarGoogle Scholar
  11. S. Auer, J. Demter, M. Martin, and J. Lehmann. LODStats - an extensible framework for high-performance dataset analytics. In EKAW, pages 353--362, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov. OWLIM: A family of scalable semantic repositories. Semant. Web, 2(1):33--42, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Bizer and A. Schultz. The R2R framework: Publishing and discovering mappings on the Web. In Int. Workshop on Consuming Linked Data (COLD), 2010.Google ScholarGoogle Scholar
  14. C. Böhm, J. Lorey, and F. Naumann. Creating voiD descriptions for Web-scale data. Web Semant., 9(3):339--345, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Ceri, G. Gottlob, and L. Tanca. What you always wanted to know about datalog (and never dared to ask). IEEE Knowl. Data Eng., 1(1):146--166, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. D. Fernández, M. A. Martínez-Prieto, C. Gutiérrez, A. Polleres, and M. Arias. Binary RDF representation for publication and exchange (HDT). Web Semant., 19:22--41, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Heitmann, R. Cyganiak, C. Hayes, and S. Decker. Architecture of Linked Data applications. In Linked Data Management: Principles and Techniques. CRC Press, 2013.Google ScholarGoogle Scholar
  19. A. Langegger and W. Woss. RDFStats - an extensible RDF statistics generator and library. In Int. Workshop on Database and Expert Systems Application, DEXA'09, pages 79--83, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Le-Phuoc, A. Polleres, M. Hauswirth, G. Tummarello, and C. Morbidoni. Rapid prototyping of semantic mash-ups through Semantic Web Pipes. In WWW, pages 581--590, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Margara, J. Urbani, F. van Harmelen, and H. Bal. Streaming the Web: Reasoning over dynamic data. Web Semant., 25(0):24--44, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Marx, S. Shekarpour, S. Auer, and A.-C. Ngomo. Large-scale RDF dataset slicing. In IEEE Int. Conf. on Semantic Computing (ICSC), pages 228--235, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. O'Connell. A survey of graph algorithms under extended streaming models of computation. In Fundamental Problems in Computing, pages 455--476. Springer Netherlands, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Schultz, A. Matteini, R. Isele, P. N. Mendes, C. Bizer, and C. Becker. LDIF - a framework for large-scale Linked Data integration. In WWW Developers Track, 2012.Google ScholarGoogle Scholar
  25. J. Urbani, S. Kotoulas, J. Maassen, F. Van Harmelen, and H. Bal. WebPIE: Aweb-scale parallel inference engine using MapReduce. J. Web Semant, 10:59--75, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov. Silk - A link discovery framework for the Web of Data. In Workshop on Linked Data on the Web (LDOW), 2009.Google ScholarGoogle Scholar
  27. M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable internet services. In ACM Symposium on Operating Systems Principles (SOSP), pages 230--243, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Processing billions of RDF triples on a single machine using streaming and sorting

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing
          April 2015
          2418 pages
          ISBN:9781450331968
          DOI:10.1145/2695664

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 April 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SAC '15 Paper Acceptance Rate291of1,211submissions,24%Overall Acceptance Rate1,650of6,669submissions,25%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader