skip to main content
10.1145/2245276.2245343acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Compression of RDF dictionaries

Published: 26 March 2012 Publication History

Abstract

The use of dictionaries is a common practice among those applications performing on huge RDF datasets. It allows long terms occurring in the RDF triples to be replaced by short IDs which reference them. This decision greatly compacts the dataset and thus mitigates its scalability issues. However, the dictionary size is not negligible and the techniques used for its representation also suffer from scalability limitations. This paper focuses on this scenario by adapting compression techniques for string dictionaries to the case of RDF. We propose a novel technique: Dcomp, which can be tuned to represent the dictionary in compressed space (22--64%) and to perform in a few microseconds (1--50μs).

References

[1]
Compact Data Structures Library (libcds). http://libcds.recoded.cl/.
[2]
RDF Primer. W3C Recommendation. 2004. http://www.w3.org/TR/rdf-primer/.
[3]
SPARQL Query Language for RDF. W3C Recommendation. 2008. http://www.w3.org/TR/rdf-sparql-query/.
[4]
Binary RDF Representation for Publication and Exchange (HDT). W3C Member Submission. 2011. http://www.w3.org/Submission/2011/03/.
[5]
D. Abadi, A. Marcus, S. Madden, and K. Hollenbach. Scalable Semantic Web Data Management Using Vertical Partitioning. In Proc. of VLDB, pages 411--422, 2007.
[6]
R. Bayer and E. E. McCreight. Organization and maintenance of large ordered indices. In Proc. of ACM SIGFIDET, pages 107--141, 1970.
[7]
M. Bender, M. Farach-Colton, and B. Kuszmaul. Cache-oblivious string B-trees. In Proc. of PODS, pages 233--242, 2006.
[8]
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American Magazine, 2001.
[9]
C. Bizer, T. Heath, and T. Berners-Lee. Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems, 5: 1--22, 2009.
[10]
N. Brisaboa, R. Cánovas, F. Claude, M. A. Martínez-Prieto, and G. Navarro. Compressed String Dictionaries. In Proc. of SEA, pages 136--147, 2011.
[11]
M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Technical report, Digital Equipment Corporation, 1994.
[12]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001.
[13]
J. D. Fernández, M. A. Martínez-Prieto, and C. Gutiérrez. Compact Representation of Large RDF Data Sets for Publishing and Exchange. In Proc. of ISWC, pages 193--208, 2010.
[14]
P. Ferragina and G. Manzini. Opportunistic Data Structures with Applications. In Proc. of FOCS, pages 390--398, 2000.
[15]
P. Ferragina and R. Venturini. The compressed permuterm index. ACM Trans. Alg., 7(1): art. 10, 2010.
[16]
R. González, S. Grabowski, V. Mäkinen, and G. Navarro. Practical implementation of rank and select queries. In Proc. of WEA, pages 27--38, 2005.
[17]
S. Groppe. Data Management and Query Processing in Semantic Web Databases. Springer, 2011.
[18]
A. Hogan. Exploiting RDFS and OWL for Integrating Heterogeneous, Large-Scale, Linked Data Corpora. PhD thesis, National University of Ireland, 2011.
[19]
D. A. Huffman. A method for the construction of minimum-redundancy codes. Proc. of the Institute of Radio Engineers, 40(9): 1098--1101, 1952.
[20]
D. Knuth. The Art of Computer Programming, volume 3: Sorting and Searching. Addison Wesley, 1973.
[21]
G. Navarro and V. Mäkinen. Compressed Full-Text Indexes. ACM Computing Surveys, 39(1): art. 2, 2007.
[22]
T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19: 91--113, 2010.
[23]
R. Raman, V. Raman, and S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proc. of SODA, pages 233--242, 2002.
[24]
C. Weiss, P. Karras, and A. Bernstein. Hexastore: Sextuple Indexing for Semantic Web Data Management. Proceedings of VLDB Endowment, 1(1): 1008--1019, 2008.
[25]
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999.

Cited By

View all
  • (2024)Generate and Update Large HDT RDF Knowledge Graphs on Commodity HardwareThe Semantic Web10.1007/978-3-031-60635-9_8(128-144)Online publication date: 26-May-2024
  • (2023)Graph pattern detection and structural redundancy reduction to compress named graphsInformation Sciences10.1016/j.ins.2023.119428(119428)Online publication date: Jul-2023
  • (2022)gRDF: An Efficient Compressor with Reduced Structural Regularities That Utilizes gRePairSensors10.3390/s2207254522:7(2545)Online publication date: 26-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
  • Conference Chairs:
  • Sascha Ossowski,
  • Paola Lecca
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDF dictionaries
  2. SPARQL
  3. compression
  4. scalability

Qualifiers

  • Research-article

Funding Sources

Conference

SAC 2012
Sponsor:
SAC 2012: ACM Symposium on Applied Computing
March 26 - 30, 2012
Trento, Italy

Acceptance Rates

SAC '12 Paper Acceptance Rate 270 of 1,056 submissions, 26%;
Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Generate and Update Large HDT RDF Knowledge Graphs on Commodity HardwareThe Semantic Web10.1007/978-3-031-60635-9_8(128-144)Online publication date: 26-May-2024
  • (2023)Graph pattern detection and structural redundancy reduction to compress named graphsInformation Sciences10.1016/j.ins.2023.119428(119428)Online publication date: Jul-2023
  • (2022)gRDF: An Efficient Compressor with Reduced Structural Regularities That Utilizes gRePairSensors10.3390/s2207254522:7(2545)Online publication date: 26-Mar-2022
  • (2022)Knowledge Graph Compression for Big Semantic DataEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_62-2(1-13)Online publication date: 17-Mar-2022
  • (2021)Applying Grammar-Based Compression to RDFThe Semantic Web10.1007/978-3-030-77385-4_6(93-108)Online publication date: 31-May-2021
  • (2019)RDF CompressionEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_62(1368-1378)Online publication date: 20-Feb-2019
  • (2018)RDF CompressionEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_62-1(1-11)Online publication date: 1-Feb-2018
  • (2017)ARVIDA-ReferenzarchitekturWeb-basierte Anwendungen Virtueller Techniken10.1007/978-3-662-52956-0_3(117-191)Online publication date: 7-Apr-2017
  • (2016)A Scalable Approach for Computing Semantic Relatedness using Semantic Web DataProceedings of the 6th International Conference on Web Intelligence, Mining and Semantics10.1145/2912845.2912864(1-9)Online publication date: 13-Jun-2016
  • (2016)Compressing graphs by grammars2016 IEEE 32nd International Conference on Data Engineering (ICDE)10.1109/ICDE.2016.7498233(109-120)Online publication date: May-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media