Skip to main content

Applying Grammar-Based Compression to RDF

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2021)

Abstract

Data compression for RDF knowledge graphs is used in an increasing number of settings. In parallel to this, several grammar-based graph compression algorithms have been developed to reduce the size of graphs. We port gRePair—a state-of-the-art grammar-based graph compression algorithm—to RDF (named RDFRePair). We compare this promising technique with respect to the compression ratio to the state-of-the-art approaches for RDF compression dubbed HDT, HDT++ and OFR as well as a \(k^2\)-trees-based RDF compression. We run an extensive evaluation on 40 datasets. Our results suggest that RDFRePair achieves significantly better compression ratios and runtimes than gRePair. However, it is outperformed by \(k^2\) trees, which achieve the overall best compression ratio on real-world datasets. This better performance comes at the cost of time, as \(k^2\) trees are clearly outperformed by OFR w.r.t. compression and decompression time. A pairwise Wilcoxon Signed Rank Test suggests that while OFR is significantly more time-efficient than HDT and \(k^2\) trees, there is no significant difference between the compression ratios achieved by \(k^2\) trees and OFR. In addition, we point out future directions for research. All code and datasets are available at https://github.com/dice-group/GraphCompression and https://hobbitdata.informatik.uni-leipzig.de/rdfrepair/evaluation_datasets/, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.w3.org/Submission/HDT/.

  2. 2.

    In the current implementation, we use 32 Bit integers. They can be extended to 64 Bits for very large graphs.

  3. 3.

    All experiments were executed on a 64-bit Ubuntu 16.04 machine, an Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30 GHz with 64 CPUs and 128 GB RAM. Only the experiments for WatDiv were executed on a 64-bit Debian machine with 128 CPUs and 1TB RAM.

  4. 4.

    The datasets can be found at https://w3id.org/dice-research/data/rdfrepair/evaluation_datasets/. For scholarly data (DF0–DF9), we use the rich datasets (see http://www.scholarlydata.org/dumps/).

  5. 5.

    https://github.com/rdfhdt/hdt-java.

  6. 6.

    HDT++ is available at https://github.com/antonioillera/iHDTpp-src. OFR is not publicly available. However, the authors were so kind to provide us the binaries.

  7. 7.

    For a fair comparison, we turned this feature of gRePair in our evaluation off. Otherwise, it couldn’t be used with the HDT dictionary.

References

  1. Álvarez-García, S., Brisaboa, N., Fernández, J.D., Martínez-Prieto, M.A., Navarro, G.: Compressed vertical partitioning for efficient RDF management. Knowl. Inf. Syst. 44(2), 439–474 (2014). https://doi.org/10.1007/s10115-014-0770-y

    Article  Google Scholar 

  2. Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., MartíÂnez-Prieto, M.A.: Compressed k\(^2\)-Triples for Full-In-Memory RDF Engines. In: AMCIS 2011 Proceedings. IEEE (2011)

    Google Scholar 

  3. Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle. W3C Recommendation, W3C (February 2014)

    Google Scholar 

  4. Berners-Lee, T.: Primer: Getting into RDF & Semantic Web Using N3. Technical Report W3C, (October 2010)

    Google Scholar 

  5. Brisaboa, N.R., Ladra, S., Navarro, G.: k 2-trees for compact web graph representation. In: International Symposium on String Processing and Information Retrieval (2009)

    Google Scholar 

  6. Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 145–156. ACM (2011)

    Google Scholar 

  7. Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: Proceedings of the 19th International Conference on World Wide Web (2010)

    Google Scholar 

  8. Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A, Polleres, A., Arias, M.: Binary RDF Representation for Publication and Exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web 19, 22–41 (2013)

    Google Scholar 

  9. Gayathri, V., Kumar, P.S.: Horn-rule based compression technique for RDF data. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC 2015), pp. 396–401. Association for Computing Machinery, New York (2015)

    Google Scholar 

  10. Hernández-Illera, A., Martínez-Prieto, M.A., Fernández, J.D.: Serializing RDF in compressed space. In: 2015 Data Compression Conference, pp. 363–372. IEEE (2015)

    Google Scholar 

  11. Hitzler, P., Krötzsch, M., Rudolph, S., Sure, Y.: Semantic Web: Grundlagen. Springer (2007). 10.1007/978-3-319-93417-4

    Google Scholar 

  12. Joshi, A.K., Hitzler, P., Dong, G.: Logical linked data compression. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 170–184. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_12

    Chapter  Google Scholar 

  13. Maneth, S., Peternek, F.: Grammar-based graph compression. Inf. Syst. 76, 19–45 (2018)

    Article  Google Scholar 

  14. Martínez-Prieto, M.A., Fernández, J.D., Cánovas, R.: Compression of RDF dictionaries. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing (SAC 2012), pp. 340–347. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2245276.2245343

  15. Pan, J.Z., Pérez, J.M.G., Ren, Y., Wu, H., Wang, H., Zhu, M.: Graph pattern based RDF data compression. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 239–256. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15615-6_18

    Chapter  Google Scholar 

  16. Salomon, D.: Data Compression: The Complete Reference. Springer, New York (2004). 10.1007/b97635

    Google Scholar 

  17. Swacha, J., Grabowski, S.: OFR: an efficient representation of RDF datasets. In: Languages, Applications and Technologies. pp. 224–235. Springer International Publishing (2015)

    Google Scholar 

  18. Wang, K., Fu, H., Peng, S., Gong, Y., Gu, J.: A RDF data compress model based on octree structure. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA). pp. 990–994 (2017)

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) within the project SPEAKER under the grant no 01MK20011U and by the EU H2020 Marie Skłodowska-Curie project KnowGraphs under the grant agreement no 860801.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Röder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Röder, M., Frerk, P., Conrads, F., Ngomo, AC.N. (2021). Applying Grammar-Based Compression to RDF. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77385-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77384-7

  • Online ISBN: 978-3-030-77385-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics