Skip to main content
Log in

A case study on propagating and updating provenance information using the CIDOC CRM

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Provenance information of digital objects maintained by digital libraries and archives is crucial for authenticity assessment, reproducibility and accountability. Such information is commonly stored on metadata placed in various Metadata Repositories (MRs) or Knowledge Bases (KBs). Nevertheless, in various settings it is prohibitive to store the provenance of each digital object due to the high storage space requirements that are needed for having complete provenance. In this paper, we introduce provenance-based inference rules as a means to complete the provenance information, to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). Roughly, we show how provenance information can be propagated by identifying a number of basic inference rules over a core conceptual model for representing provenance. The propagation of provenance concerns fundamental modelling concepts such as actors, activities, events, devices and information objects, and their associations. However, since a MR/KB is not static but changes over time due to several factors, the question that arises is how we can satisfy update requests while still supporting the aforementioned inference rules. Towards this end, we elaborate on the specification of the required add/delete operations, consider two different semantics for deletion of information, and provide the corresponding update algorithms. Finally, we report extensive comparative results for different repository policies regarding the derivation of new knowledge, in datasets containing up to one million RDF triples. The results allow us to understand the tradeoffs related to the use of inference rules on storage space and performance of queries and updates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. http://www.europeana.eu/.

  2. http://www.fedora-commons.org/.

  3. It was initially defined during the EU Project CASPAR (http://www.casparpreserves.eu/) (FP6-2005-IST-033572) and its evolution continued during the EU Project IST IP 3D-COFORM (http://www.3d-coform.eu/).

  4. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSDownload.

  5. http://www.3d-coform.eu/.

  6. http://en.wikipedia.org/wiki/Directed_acyclic_graph.

  7. http://www.ontotext.com/owlim.

  8. http://virtuoso.openlinksw.com/.

References

  1. Definition of the cidoc conceptual reference model. http://www.cidoc-crm.org/docs/cidoc_crm_version_5.0.4.pdf

  2. The Dublin Core Metadata Initiative. http://dublincore.org/

  3. Riding the wave - How Europe can gain from the rising tide of scientific data (2010). http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf

  4. Albano, A., Cardelli, L., Orsini, R.: Galileo: a strongly-typed, interactive conceptual language. ACM Trans. Database Syst. 10(2), 230–260 (1985)

    Article  Google Scholar 

  5. Aldeco-Pérez, R., Moreau, L.: Information accountability supported by a provenance-based compliance framework. In: UK e-Science All Hands Meeting, vol. 1 (2009)

  6. Amsterdamer, Y., Deutch, D., Milo, T., Tannen, V.: On provenance minimization. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS ’11, pp. 141–152. ACM, New York, NY, USA (2011)

  7. Anand, M.K., Bowers, S., Ludäscherr, B.: Techniques for efficiently querying scientific workflow provenance graphs. In: Proceedings of the 13th International Conference on extending database technology, EDBT ’10, pp. 287–298. ACM (2010)

  8. Anand, M.K., Bowers, S., McPhillips, T., Ludäscher, B.: Efficient provenance storage over nested data collections. In: Proceedings of the 12th International Conference on extending database technology: advances in database technology. EDBT ’09, pp. 958–969. ACM, New York, NY, USA (2009)

  9. Atkinson, M., DeWitt, D., Maier, D., Bancilhon, F., Dittrich, K., Zdonik, S.: Building an object-oriented database system. chap. The object-oriented database system manifesto, pp. 1–20. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1992)

  10. Bancilhon, F., Spyratos, N.: Update semantics of relational views. ACM Trans. Database Syst. 6(4), 557–575 (1981)

    Article  MATH  Google Scholar 

  11. Bechhofer, S., Horrocks, I., Goble, C.A., Stevens, R.: OilEd: A reasonable ontology editor for the semantic web. In: Proceedings of the Joint German/Austrian Conference on AI: advances in artificial intelligence. KI ’01, pp. 396–408. Springer, London, UK (2001)

  12. Bizer, C., Heath, T., Berners-Lee, T.: Linked data—the story so far. Int. J. Semant. Web Inf. Systems (IJSWIS) 5(3), 1–22 (2009)

    Article  Google Scholar 

  13. Boley, H.: Relationships between logic programming and RDF. In: Revised Papers from the PRICAI 2000 Workshop Reader. Four Workshops held at PRICAI 2000 on Advances in Artificial Intelligence, pp. 201–218. Springer, London, UK (2001)

  14. Bonatti, P.A., Hogan, A., Polleres, A., Sauro, L.: Robust and scalable linked data reasoning incorporating provenance and trust annotations. Web Semant. Sci. Serv. Agents World Wide Web 9(2), 165–201 (2011)

  15. Bowers, S., McPhillips, T., Ludäscher, B.: Declarative rules for inferring fine-grained data provenance from scientific workflow execution traces. In: Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes, IPAW’12, pp. 82–96 (2012)

  16. Brickley, D., Guha, R.: Resource description framework (RDF) schema specification (2004). http://www.w3.org/TR/rdf-schema/

  17. Bry, F.: Logic programming. chap. Intensional updates: abduction via deduction, pp. 561–575. MIT Press, Cambridge, MA, USA (1990)

  18. Carey, M.J., DeWitt, D.J.: A data model and query language for exodus. In: Proceedings of the 1988 ACM SIGMOD international conference on Management of data. SIGMOD ’88, pp. 413–423. ACM, New York, NY, USA (1988)

  19. Chapman, A.P., Jagadish, H.V., Ramanan, P.: Efficient provenance storage. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. SIGMOD ’08, pp. 993–1006. ACM, New York, NY, USA (2008)

  20. Cosmadakis, S.S., Papadimitriou, C.H.: Updates of relational views. J. ACM 31(4), 742–760 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  21. Dalal, M.: Investigations Into a Theory of Knowledge Base Revision: Preliminary Report. In: Rosenbloom, P., Szolovits, P. (eds.) Proceedings of the Seventh National Conference on Artificial Intelligence, vol. 2, pp. 475–479. AAAI Press, Menlo Park, California (1988)

    Google Scholar 

  22. Dayal, U., Bernstein, P.A.: On the correct translation of update operations on relational views. ACM Trans. Database Syst. 7(3), 381–416 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  23. De Nies, T.: Constraints of the prov data model (2013). http://www.w3.org/TR/prov-constraints/

  24. Decker, H.: Drawing updates from derivations. In: Proceedings of the third international conference on database theory on Database theory. ICDT ’90, pp. 437–451. Springer-Verlag New York Inc, New York, NY, USA (1990)

  25. Doerr, M.: The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata. AI Mag. 24(3), 75–92 (2003)

    MathSciNet  Google Scholar 

  26. Doerr, M., Theodoridou, M.: CRMdig: a generic digital provenance model for scientific observation. In: Proceedings of TaPP’11: 3rd, USENIX Workshop on the Theory and Practice of Provenance (2011)

  27. Erling, O., Mikhailov, I.: SPARQL and Scalable Inference on Demand (2009). http://virtuoso.openlinksw.com/whitepapers/SPARQL%20and%20Scalable%20Inference%20on%20Demand.pdf

  28. Flouris, G., Konstantinidis, G., Antoniou, G., Christophides, V.: Formal foundations for RDF/S KB evolution. Knowledge and Information Systems pp. 1–39 (2012)

  29. Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G.: Ontology change: classification and survey. Knowl. Eng. Rev. 23(02), 117–152 (2008)

    Article  Google Scholar 

  30. Gabel, T., Sure, Y., Völker, J.: KAON—Ontology Management Infrastructure. SEKT informal deliverable 3.1.1.a, Institute AIFB, University of Karlsruhe (2004). http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/SEKT-D3.1.1.a.pdf

  31. Gärdenfors, P.: Knowledge in Flux. Modelling the Dymanics of Epistemic States. MIT Press, Cambridge (1988)

    Google Scholar 

  32. Gärdenfors, P.: The dynamics of belief systems: foundations versus coherence theories. Revue Int. Philos. 44, 24–46 (1990)

    Google Scholar 

  33. Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update exchange with mappings and provenance. In: Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pp. 675–686. VLDB Endowment (2007)

  34. Gutierrez, C., Hurtado, C., Vaisman, A.: RDFS update: from theory to practice. In: Proceedings of the 8th extended semantic web conference on the semantic web: research and applications—Volume Part II. ESWC’11, pp. 93–107. Springer, Berlin, Heidelberg (2011)

  35. Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. SIGMOD ’08, pp. 1007–1018. ACM, New York, NY, USA (2008)

  36. Klein, M., Noy, N.: A component-based framework for ontology evolution. In: Workshop on Ontologies and Distributed Systems at IJCAI-03, Acapulco, Mexico (2003)

  37. Konstantinidis, G., Flouris, G., Antoniou, G., Christophides, V.: A Formal Approach for RDF/S Ontology Evolution. In: Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence, pp. 70–74. IOS Press, Amsterdam, The Netherlands, The Netherlands (2008)

  38. Lassila, O., Swick, R.R.: Resource description framework (RDF) model and syntax specification. W3c recommendation (1999). http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

  39. Laurent, D., Luong, V.P., Spyratos, N.: Deleted tuples are useful when updating through universal scheme interfaces. In: Golshani, F. (ed.) ICDE, pp. 420–427. IEEE Computer Society (1992)

  40. Laurent, D., Phan Luong, V., Spyratos, N.: Updating intensional predicates in deductive databases. In: Data Engineering, 1993. Proceedings. Ninth International Conference on, pp. 14–21 (1993)

  41. Lim, C., Lu, S., Chebotko, A., Fotouhi, F.: Prospective and retrospective provenance collection in scientific workflow environments. In: IEEE SCC, pp. 449–456. IEEE Computer Society (2010)

  42. Lösch, U., Rudolph, S., Vrandečić, D., Studer, R.: Tempus Fugit - Towards an Ontology Update Language. In: Proceedings of the 6th European Semantic Web Conference on the semantic web: research and applications. ESWC 2009 Heraklion, pp. 278–292. Springer, Berlin, Heidelberg (2009)

  43. Magiridou, M., Sahtouris, S., Christophides, V., Koubarakis, M.: RUL: a declarative update language for RDF. In: Proceedings of the 4th international conference on the semantic web. ISWC’05, pp. 506–521. Springer, Berlin, Heidelberg (2005)

  44. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E.G., den Bussche, J.V.: The open provenance model core specification (v1.1). Future Generation Comp. Syst. 27(6), 743–756 (2011)

  45. Moreau, L., Missier, P.: The PROV data model and abstract syntax notation (2011). http://www.w3.org/TR/2011/WD-prov-dm-20111018/

  46. Mudge, M., Malzbender, T., Chalmers, A., Scopigno, R., Davis, J., Wang, O., Gunawardane, P., Ashley, M., Doerr, M., Proenca, A., Barbosa, J.: Image-based empirical information acquisition, scientific reliability, and long-term digital preservation for the natural sciences and cultural heritage. Eurographics Association, Crete, Greece (2008). http://www.eg.org/EG/DL/conf/EG2008/tutorials/T2.pdf

  47. NASA: Science Instrument Details. http://mars.jpl.nasa.gov/msl/mission/instruments/

  48. Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3(1–2), 256–263 (2010)

    Article  Google Scholar 

  49. Noy, N., Fergerson, R., Musen, M.: The Knowledge Model of Protégé-2000: Combining Interoperability and Flexibility. In: Proceedings of the 12th European Workshop on knowledge acquisition, modeling and management, EKAW ’00, pp. 17–32. Springer, Berlin (2000)

  50. Polanyi, M.: The Tacit Dimension. Doubleday, Garden City, NY (1966)

    Google Scholar 

  51. Salza, S., Guercio, M., Grossi, M., Pröll, S., Strubulis, C., Tzitzikas, Y., Doerr, M., Flouris, G.: D24.1 Report on authenticity and plan for interoperable authenticity evaluation system. Tech. rep. (2012). http://www.alliancepermanentaccess.org/wp-content/uploads/downloads/2012/04/APARSEN-REP-D24_1-01-2_3.pdf

  52. Schewe, K., Thalheim, B., Wetzel, I.: Foundations of object oriented database concepts. Tech. rep., Hamburg, Germany, Germany (1992). http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail&id=oai%3Ancstrlh%3Auhamburg_cs%3Ancstrl.uhamburg_cs%2F%2FB-157

  53. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)

    Article  Google Scholar 

  54. Smith, M.K.: Michael Polanyi and tacit knowledge. The encyclopedia of informal education (2003). www.infed.org/thinkers/polanyi.htm

  55. Sosa, E.: The raft and the pyramid: coherence versus foundations in the theory of knowledge. Midwest Stud. Philos. 5(1), 3–26 (1980)

    Article  Google Scholar 

  56. Stojanovic, L., Motik, B.: Ontology Evolution within Ontology Editors. In: EKAW’02/EON Workshop, pp. 53–62 (2002)

  57. Strubulis, C., Tzitzikas, Y., Doerr, M., Flouris, G.: Evolution of workflow provenance information in the presence of custom inference rules. In: 3rd International Workshop on the role of Semantic Web in Provenance Management (SWPM’12), Heraklion, Crete (2012)

  58. Sure, Y., Angele, J., Staab, S.: OntoEdit: multifaceted inferencing for ontology engineering. J. Data Semant. 2800, 2003 (2003)

    Google Scholar 

  59. Theodoridou, M., Tzitzikas, Y., Doerr, M., Marketakis, Y., Melessanakis, V.: Modeling and querying provenance by extending CIDOC CRM. Distrib. Parallel Databases 27, 169–210 (2010)

    Article  Google Scholar 

  60. Theoharis, Y., Georgakopoulos, G., Christophides, V.: PoweRGen: a power-law based generator of RDFS schemas. Inf. Systems 37(4), 306–319 (2012)

    Article  Google Scholar 

  61. Vrain, C., Laurent, D.: Updates, induction and abduction in deductive databases. In: European Conference on Artificial Intelligence (ECAI) Workshop on Abductive and Inductive Reasoning (1996)

  62. Wilkinson, K., LyngbÃęk, P., Hasan, W.: The iris architecture and implementation. IEEE Trans. Knowl. Data Eng. 2(1), 63–75 (1990)

    Article  Google Scholar 

Download references

Acknowledgments

Work done in the context of the of the following European projects: APARSEN (Alliance Permanent Access to the Records of Science in Europe Network, FP7 Network of Excellence, project number: 269977, duration: 2011–2014), DIACHRON (Managing the Evolution and Preservation of the Data Web, FP7 IP, project number 601043, duration: 2013–2016), 3D-COFORM IST IP (Tools and Expertise for 3D Collection Formation, project number: 231809, duration: 2008–2012) and PlanetData (FP7 Network of Excellence, project number: 257641, duration: 2010–2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yannis Tzitzikas.

Appendices

Appendix A: Set of update operations

Since we focus on three inference rules, we should define operations for satisfying update requests related to these rules. The signatures of the required change operations are listed below (Table 1):

Table 1 Update operations

Appendix B: Algorithms of update operations

Below, we list the algorithms of our set of update operations which were presented previously in Appendix A.

figure d
figure e
figure f
figure g
figure h
figure i
figure j
figure k
figure l
figure m
figure n
figure o
figure p
figure q
figure r
figure s
figure t
figure u
figure v
figure w
figure x
figure y
figure z
figure aa
figure ab
figure ac
figure ad
figure ae

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Strubulis, C., Flouris, G., Tzitzikas, Y. et al. A case study on propagating and updating provenance information using the CIDOC CRM. Int J Digit Libr 15, 27–51 (2014). https://doi.org/10.1007/s00799-014-0125-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-014-0125-z

Keywords

Navigation