Abstract
Data compression for RDF knowledge graphs is used in an increasing number of settings. In parallel to this, several grammar-based graph compression algorithms have been developed to reduce the size of graphs. We port gRePair—a state-of-the-art grammar-based graph compression algorithm—to RDF (named RDFRePair). We compare this promising technique with respect to the compression ratio to the state-of-the-art approaches for RDF compression dubbed HDT, HDT++ and OFR as well as a \(k^2\)-trees-based RDF compression. We run an extensive evaluation on 40 datasets. Our results suggest that RDFRePair achieves significantly better compression ratios and runtimes than gRePair. However, it is outperformed by \(k^2\) trees, which achieve the overall best compression ratio on real-world datasets. This better performance comes at the cost of time, as \(k^2\) trees are clearly outperformed by OFR w.r.t. compression and decompression time. A pairwise Wilcoxon Signed Rank Test suggests that while OFR is significantly more time-efficient than HDT and \(k^2\) trees, there is no significant difference between the compression ratios achieved by \(k^2\) trees and OFR. In addition, we point out future directions for research. All code and datasets are available at https://github.com/dice-group/GraphCompression and https://hobbitdata.informatik.uni-leipzig.de/rdfrepair/evaluation_datasets/, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
In the current implementation, we use 32 Bit integers. They can be extended to 64 Bits for very large graphs.
- 3.
All experiments were executed on a 64-bit Ubuntu 16.04 machine, an Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30 GHz with 64 CPUs and 128 GB RAM. Only the experiments for WatDiv were executed on a 64-bit Debian machine with 128 CPUs and 1TB RAM.
- 4.
The datasets can be found at https://w3id.org/dice-research/data/rdfrepair/evaluation_datasets/. For scholarly data (DF0–DF9), we use the rich datasets (see http://www.scholarlydata.org/dumps/).
- 5.
- 6.
HDT++ is available at https://github.com/antonioillera/iHDTpp-src. OFR is not publicly available. However, the authors were so kind to provide us the binaries.
- 7.
For a fair comparison, we turned this feature of gRePair in our evaluation off. Otherwise, it couldn’t be used with the HDT dictionary.
References
Álvarez-García, S., Brisaboa, N., Fernández, J.D., Martínez-Prieto, M.A., Navarro, G.: Compressed vertical partitioning for efficient RDF management. Knowl. Inf. Syst. 44(2), 439–474 (2014). https://doi.org/10.1007/s10115-014-0770-y
Álvarez-García, S., Brisaboa, N.R., Fernández, J.D., MartíÂnez-Prieto, M.A.: Compressed k\(^2\)-Triples for Full-In-Memory RDF Engines. In: AMCIS 2011 Proceedings. IEEE (2011)
Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle. W3C Recommendation, W3C (February 2014)
Berners-Lee, T.: Primer: Getting into RDF & Semantic Web Using N3. Technical Report W3C, (October 2010)
Brisaboa, N.R., Ladra, S., Navarro, G.: k 2-trees for compact web graph representation. In: International Symposium on String Processing and Information Retrieval (2009)
Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 145–156. ACM (2011)
Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: Proceedings of the 19th International Conference on World Wide Web (2010)
Fernández, J.D., Gutierrez, C., Martínez-Prieto, M.A, Polleres, A., Arias, M.: Binary RDF Representation for Publication and Exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web 19, 22–41 (2013)
Gayathri, V., Kumar, P.S.: Horn-rule based compression technique for RDF data. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC 2015), pp. 396–401. Association for Computing Machinery, New York (2015)
Hernández-Illera, A., Martínez-Prieto, M.A., Fernández, J.D.: Serializing RDF in compressed space. In: 2015 Data Compression Conference, pp. 363–372. IEEE (2015)
Hitzler, P., Krötzsch, M., Rudolph, S., Sure, Y.: Semantic Web: Grundlagen. Springer (2007). 10.1007/978-3-319-93417-4
Joshi, A.K., Hitzler, P., Dong, G.: Logical linked data compression. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 170–184. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_12
Maneth, S., Peternek, F.: Grammar-based graph compression. Inf. Syst. 76, 19–45 (2018)
Martínez-Prieto, M.A., Fernández, J.D., Cánovas, R.: Compression of RDF dictionaries. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing (SAC 2012), pp. 340–347. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2245276.2245343
Pan, J.Z., Pérez, J.M.G., Ren, Y., Wu, H., Wang, H., Zhu, M.: Graph pattern based RDF data compression. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 239–256. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15615-6_18
Salomon, D.: Data Compression: The Complete Reference. Springer, New York (2004). 10.1007/b97635
Swacha, J., Grabowski, S.: OFR: an efficient representation of RDF datasets. In: Languages, Applications and Technologies. pp. 224–235. Springer International Publishing (2015)
Wang, K., Fu, H., Peng, S., Gong, Y., Gu, J.: A RDF data compress model based on octree structure. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA). pp. 990–994 (2017)
Acknowledgements
This work has been supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) within the project SPEAKER under the grant no 01MK20011U and by the EU H2020 Marie Skłodowska-Curie project KnowGraphs under the grant agreement no 860801.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Röder, M., Frerk, P., Conrads, F., Ngomo, AC.N. (2021). Applying Grammar-Based Compression to RDF. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-77385-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77384-7
Online ISBN: 978-3-030-77385-4
eBook Packages: Computer ScienceComputer Science (R0)