Skip to main content
Log in

Provenance compression scheme based on graph patterns for large RDF documents

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Provenance data are metadata that represent the source information or modification history of various data. Provenance information can be a few dozen times greater in amount than the original data because it is continuously increased whenever the source data are modified. Therefore, schemes for efficiently compressing large-capacity provenance data are required. In this paper, we proposed a new resource description framework (RDF) provenance compression scheme that considers graph patterns. The proposed scheme reduces the space occupied by string data by converting the provenance data into numeric data through a dictionary encoding process. Unlike existing provenance compression schemes, in the proposed scheme, some RDF documents manage the source RDF documents on the semantic web to track changes in the provenance data. The proposed scheme reduces the storage space by compressing the source RDF documents by considering their patterns. It also compresses the provenance data by considering the patterns of active nodes in the PROV model. This improves the compression performance through a compression based on the provenance flow. The excellence of the proposed scheme was verified based on the compression rate and processing time determined from a performance evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Shadbolt N, Berners-Lee T, Hall W (2006) The semantic web revisited. IEEE Intell Syst 21(3):96–101

    Article  Google Scholar 

  2. Bok K, Lim J, Kim K, Yoo J (2016) A RDF indexing scheme for large scale semantic web. Inf Int Interdiscip J 19(30):1011–1020

    Google Scholar 

  3. Arenas A, Perez J (2011) Querying semantic web data with SPARQL. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp 305–316

  4. Özsu MT (2016) A survey of RDF data management systems. Front Comput Sci 10(3):418–432

    Article  Google Scholar 

  5. Frey J, Müller K, Hellmann S, Rahm E, Vidal M (2019) Evaluation of metadata representations in RDF stores. Semant Web 10(2):205–229

    Article  Google Scholar 

  6. Wylot M, Hauswirth M, Cudré-Mauroux P, Sakr S (2018) RDF data storage and query processing schemes: a survey. ACM Comput Surv 51(4):1–36

    Article  Google Scholar 

  7. Pan Z, Zhu T, Liu H, Ning H (2018) A survey of RDF management technologies and benchmark datasets. J Ambient Intell Humaniz Comput 9(5):1693–1704

    Article  Google Scholar 

  8. Liu J, Chen J, Rao Z, Sun Z, Yang H, Xu R (2018) A massive RDF storage approach based on graph database. In: International Conference on Geoinformatics and Data Analysis, pp 169–173

  9. Zou L, Özsu MT (2017) Graph-based RDF data management. Data Sci Eng 2(1):56–70

    Article  Google Scholar 

  10. Fiorelli M, Pazienza MT, Stellato A, Turbati A (2017) Change management and validation for collaborative editing of RDF datasets. Int J Metadata Semant Ontol 12(2/3):142–154

    Article  Google Scholar 

  11. Yang X (2018) Query for streaming information: dynamic processing and adaptive incremental maintenance of RDF stream. In: International World Wide Web Conferences, pp 843–847

  12. Naja I, Gibbins N (2018) Using provenance to efficiently propagate SPARQL updates on RDF source graphs. In: International Provenance and Annotation Workshop, pp 158–170

  13. Narock T, Yoon VY, March S (2014) A provenance-based approach to semantic web service description and discovery. Decis Support Syst 64:90–99

    Article  Google Scholar 

  14. Xie Y, Muniswamy-Reddy K, Feng D, Liz Y, Long DDE, Tan Z, Chen L (2012) A hybrid approach for efficient provenance storage. In: ACM Conference on Information and Knowledge Management, pp 1752–1756

  15. Wright R (2018) Quine: a temporal graph system for provenance storage and analysis. In: International Provenance and Annotation Workshop, pp 177–180

  16. Avgoustaki A, Flouris G, Fundulaki I, Plexousakis D (2016) Provenance management for evolving RDF datasets. In: International Conference on the Semantic Web, pp 575–592

  17. Wylot M, Cudré-Mauroux P, Hauswirth M, Groth PT (2017) Storing, tracking, and querying provenance in linked data. IEEE Trans Knowl Data Eng 29(8):1751–1764

    Article  Google Scholar 

  18. Piscopo A, Kaffee L, Phethean C, Simperl E (2017) Provenance information in a collaborative knowledge graph: an evaluation of Wikidata external references. In: International Semantic Web Conference, pp 542–558

  19. Liu Q, Wylot M, Phuoc DL, Hauswirth M (2019) Provenance management over linked data streams. Open J Databases 6(1):5–20

    Google Scholar 

  20. Xin Y, Wang X, Jin D, Wang S (2018) Distributed efficient provenance-aware regular path queries on large RDF graphs. In: International Conference on Database Systems for Advanced Applications, pp 766–782

  21. Camisetty A, Chandurkar C, Sun M, Koop D (2019) Enhancing web-based analytics applications through provenance. IEEE Trans Visual Comput Graph 25(1):131–141

    Article  Google Scholar 

  22. Ornelas T, Braga RMM, David JMN, Campos F, Costa GCB (2018) Provenance data discovery through semantic web resources. Concurr Comput Pract Exp 30(6):e4366

    Article  Google Scholar 

  23. Simmhan Y, Plale B, Gannon D (2005) A survey of data provenance in e-science. SIGMOD Rec 34(3):31–36

    Article  Google Scholar 

  24. Miao H, Deshpande A (2018) ProvDB: provenance-enabled lifecycle management of collaborative data analysis workflows. IEEE Data Eng Bull 41(4):26–38

    Google Scholar 

  25. Gaspar W, Braga RMM, Campos F, David JMN, Ornelas T (2015) Scientific provenance metadata capture and management using semantic web. Int J Metadata Semant Ontol 10(2):123–138

    Article  Google Scholar 

  26. Sharma K, Marjit U, Biswas U (2015) Efficient provenance storage for RDF dataset in semantic web environment. In: International Conference on Information Technology, pp 94–100

  27. Mahmood T, Jami SI, Shaikh ZA, Mughal MH (2013) Toward the modeling of data provenance in scientific publications. Comput Stand Interfaces 35(1):6–29

    Article  Google Scholar 

  28. Chebotko A, Lu S, Fei X, Fotouhi F (2010) RDFProv: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl Eng 69(8):836–865

    Article  Google Scholar 

  29. Khan FA, Hussain S, Janciak I, Brezany P (2011) Towards next generation provenance systems for e-science. Int J Inf Syst Model Des 2(3):24–48

    Article  Google Scholar 

  30. Moreau L, Groth PT (2013) Provenance: an introduction to PROV. Synthesis lectures on the semantic web: theory and technology. Morgan & Claypool Publishers, San Rafael, pp 1–129

    Google Scholar 

  31. Missier P, Belhajjame K, Cheney J (2013) The W3C PROV family of specifications for modelling provenance metadata. In: International Conference on Extending Database Technology, pp 773–776

  32. Closa G, Masó-Pau J, Proß B, Pons X (2017) W3C PROV to describe provenance at the dataset, feature and attribute levels in a distributed environment. Comput Environ Urban Syst 64:103–117

    Article  Google Scholar 

  33. PROV-Overview. http://www.w3.org/TR/prov-overview/. Accessed 19 Oct 2018

  34. PROV-DM: The PROV Data Model. http://www.w3.org/TR/prov-dm/. Accessed 16 Dec 2018

  35. Halpin H, Cheney J (2014) Dynamic provenance for SPARQL updates. In: International Semantic Web Conference (1), pp 425–440

  36. Halpin H, Cheney J (2014) Dynamic provenance for SPARQL updates using named graphs. In: International World Wide Web Conference, pp 287–288

  37. García-Cuesta E, Gómez-Pérez JM (2018) Indexing execution patterns in workflow provenance graphs through generalized Trie structures. Preprint arXiv:1807.07346

  38. Fernández JD, Martínez-Prieto MA, Polleres A, Reindorf J (2018) HDTQ: managing RDF datasets in compressed space. In: European Semantic Web Conference, pp 191–208

  39. Dolgorsuren B, Khan K, Rasel MK, Lee Y (2019) StarZIP: streaming graph compression technique for data archiving. IEEE Access 7:38020–38034

    Article  Google Scholar 

  40. Maneth S, Peternek F (2018) Grammar-based graph compression. Inf Syst 76:19–45

    Article  Google Scholar 

  41. Chapman A, Jagadish HV, Ramanan P (2008) Efficient provenance storage. In: ACM SIGMOD International Conference on Management of Data, pp 993–1006

  42. Xie Y, Reddy KM, Feng D, Li Y, Long DDE (2013) Evaluation of a hybrid approach for efficient provenance storage. J ACM Trans Storage 9(4):1–29

    Article  Google Scholar 

  43. Álvarez-García S, Brisaboa NR, Fernández JD, Martínez-Prieto MA (2011) Compressed k2-triples for full-in-memory RDF engines. In: Americas Conference on Information Systems, pp 1–9

  44. Brisaboa NR, Ladra S, Navarro G (2009) k2-trees for compact web graph representation. In: International Symposium on String Processing and Information Retrieval, pp 18–30

  45. García NF, Fisteus JA, Sánchez L, Fuentes-Lorenzo D, Corcho Ó (2014) RDSZ: an approach for lossless RDF stream compression. In: International Conference on the Semantic Web: Trends and Challenges, pp 52–67

  46. Deutsch P, Gailly J (1996) ZLIB compressed data format specification version 3.3. Req Comments 1950:1–11

    Google Scholar 

Download references

Acknowledgements

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. B0101-15-0266, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis), by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (No. NRF-2017M3C4A7069432), and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2019R1I1A1A01062289).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaesoo Yoo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bok, K., Han, J., Lim, J. et al. Provenance compression scheme based on graph patterns for large RDF documents. J Supercomput 76, 6376–6398 (2020). https://doi.org/10.1007/s11227-019-02926-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02926-2

Keywords

Navigation