Skip to main content
Log in

A systematic review of provenance systems

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Provenance refers to the entire amount of information, comprising all the elements and their relationships, that contribute to the existence of a piece of data. The knowledge of provenance data allows a great number of benefits such as verifying a product, result reproductivity, sharing and reuse of knowledge, or assessing data quality and validity. With such tangible benefits, it is no wonder that in recent years, research on provenance has grown exponentially, and has been applied to a wide range of different scientific disciplines. Some years ago, managing and recording provenance information were performed manually. Given the huge volume of information available nowadays, the manual performance of such tasks is no longer an option. The problem of systematically performing tasks such as the understanding, capture and management of provenance has gained significant attention by the research community and industry over the past decades. As a consequence, there has been a huge amount of contributions and proposed provenance systems as solutions for performing such kinds of tasks. The overall objective of this paper is to plot the landscape of published systems in the field of provenance, with two main purposes. First, we seek to evaluate the desired characteristics that provenance systems are expected to have. Second, we aim at identifying a set of representative systems (both early and recent use) to be exhaustively analyzed according to such characteristics. In particular, we have performed a systematic literature review of studies, identifying a comprehensive set of 105 relevant resources in all. The results show that there are common aspects or characteristics of provenance systems thoroughly renowned throughout the literature on the topic. Based on these results, we have defined a six-dimensional taxonomy of provenance characteristics attending to: general aspects, data capture, data access, subject, storage, and non-functional aspects. Additionally, the study has found that there are 25 most referenced provenance systems within the provenance context. This study exhaustively analyzes and compares such systems attending to our taxonomy and pinpoints future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Carata L, Akoush S, Balakrishnan N et al (2014) A primer on provenance. Commun ACM 12(3):10:10–10:23

  2. Glavic B, Dittrich KR (2007) Data provenance: a categorization of existing approaches. In: Proceedings of datenbanksysteme in business, Technologie und Web (BTW’07), pp 227–241

  3. Tan WC (2007) Provenance in databases: past, current, and future. IEEE Data Eng Bull 30(4):3–12

    Google Scholar 

  4. Moreau L (2010) The foundations for provenance on the web. Found Trends Web Sci 2(2–3):99–241

    Article  Google Scholar 

  5. Buneman P, Davidson SB (2017) Data provenance—the foundation of data quality. www.sei.cmu.edu/measurement/research/upload/Davidson.pdf. Visited Dec 2017

  6. Freire J, Koop D, Santos E, Silva CT (2008) Provenance for computational tasks: a survey. Comput Sci Eng 10(3):11–21

    Article  Google Scholar 

  7. Davidson SB, Freire J (2008) Provenance and scientific workflows: challenges and opportunities. In: Proceedings of MOD’08, pp 1345–1350

  8. Moore R, Jagatheesan A, Rajasekar A et al (2004) Data grid management systems. In: Proceedings of the 21st IEEE conference on mass storage systems and technologies (MSST04). IEEE, pp 1–15

  9. Glavic B (2017) Perm: efficient provenance support for relational databases. PhD Thesis, University of Zurich (2010). http://www.zora.uzh.ch/44573/1/dissGlavic.pdf. Visited Dec 2017

  10. Groth P, Luck M, Moreau L (2005) A protocol for recording provenance in service-oriented grids. In: Proceedings of the 8th international conference on principles of distributed systems (OPODIS’04). Springer, Berlin, pp 124–139

  11. Davidson SB, Cohen-Boulakia S, Eyal A et al (2007) Provenance in scientific workflow systems. IEEE Data Eng Bull 30(4):44–50

    Google Scholar 

  12. Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-Science. SIGMOD Rec 34(3):31–36

    Article  Google Scholar 

  13. da Cruz SMS, Campos MLM, Mattoso M (2009) Towards a taxonomy of provenance in scientific workflow management systems. In: Proceedings of the IEEE congress on services, Part I, SERVICES I, pp 259–266

  14. Buneman P, Khanna S, Tan WC (2001) Why and where: a characterization of data provenance. In: Proceedings of the 8th international conference on database theory (ICDT), pp 316–330

  15. Buneman P, Tan WC (2007) Provenance in databases. In: Proceedings of MOD’07. ACM, pp 1171–1173

  16. Cheney J, Chiticariu L, Tan WC (2009) Provenance in databases: why, how, and where. Found Trends Databases 1(4):379–474

    Article  Google Scholar 

  17. Bose R, Frew J (2005) Lineage retrieval for scientific data processing: a survey. ACM Comput Surv 37(1):1–28

    Article  Google Scholar 

  18. Simmhan YL, Plale B, Gannon D (2017) A survey of data provenance techniques. Technical Report 612. Extended version of SIGMOD Record 2005. http://www.cs.indiana.edu/pub/techreports/TR618.pdf. Visited Dec 2017

  19. Cuzzocrea A (2016) Big data provenance: state-of-the-art analysis and emerging research challenges. In: Proceedings of the workshops of the EDBT/ICDT, pp 797–800

  20. Dogan G (2016) A survey of provenance in wireless sensor networks. Ad Hoc Sens Wirel Netw 30(1–2):21–45

    Google Scholar 

  21. Tan YS, Ko Ryan KL, Holmes G (2013) Security and data accountability in distributed systems: a provenance survey. In: Proceedings of HPCC’13. IEEE Computer Society, pp 1571–1578

  22. Wang C, Zheng W, Bertino E (2016) Provenance for wireless sensor networks: a survey. Data Sci Eng 1(3):189–200

    Article  Google Scholar 

  23. Kitchenham BA (2017) Procedures for performing systematic reviews. Technical Report TR/SE-0401 (2004), Keele University. http://www.inf.ufsc.br/~aldo.vw/kitchenham.pdf. Visited Dec 2017

  24. Kitchenham B, Charters S (2017) Guidelines for performing Systematic Literature Reviews in Software Engineering. (EBSE 2007–01). http://pages.cpsc.ucalgary.ca/~sillito/cpsc-601.23/readings/kitchenham-2007.pdf. Visited Dec 2017

  25. Kitchenham B, Brereton OP, Budgen D et al (2009) Systematic literature reviews in software engineering: a systematic literature review. Inf Softw Technol 51(1):7–15

    Article  Google Scholar 

  26. Zhang H, Babar MA (2011) An empirical investigation of systematic reviews in software engineering. In: Proceedings of ESEM’11, Banff, Canada. IEEE, pp 87–96

  27. Santos RES, da Silva FQB (2013) Motivation to perform systematic reviews and their impact on software engineering practice. In: Proceedings of ESEM’13, pp 292–295

  28. Supplementary material of the Systematic Review. http://www.unirioja.es/cu/beperev/SupplementaryMaterial.html. Visited Dec 2017

  29. Bavoil L, Callahan SP, Crossno PJ et al (2005) Vistrails: enabling interactive multiple-view visualizations. In: Proceedings of the IEEE visualization (VIS’05). IEEE, pp 135–142

  30. Freire J, Silva CT, Callahan SP et al (2006) Managing rapidly-evolving scientific workflows. In: Proceedings of IPAW’06. Springer, Berlin, pp 10–18

  31. Gammack D, Scott S, Chapman AP (2016) Modelling provenance collection points and their impact on provenance graphs. In: Proceedings of IPAW’16, pp 146–157

  32. Chirigati F, Freire J, Koop D, Silva C (2013) Vistrails provenance traces for benchmarking. In: Proceedings of the joint EDBT/ICDT 2013 workshops, pp 323–324

  33. Scheidegger CE, Vo HT et al (2008) Querying and re-using workflows with vistrails. In: Proceedings of MOD’08, pp 1251–1254

  34. Missier P, Soiland-Reyes S, Owen S et al (2010) Taverna, reloaded. In: Proceedings of the international conference on scientific and statistical database management (SSDBM’10), pp 471–448

  35. Wolstencroft K, Haines R, Fellows D et al (2013) The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. In: Nucleic acids research, pp 557–561

  36. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045–3054

    Article  Google Scholar 

  37. Alper P, Belhajjame K, Goble CA (2017) Static analysis of taverna workflows to predict provenance patterns. Future Gener Comput Syst 75:310–329

    Article  Google Scholar 

  38. Altintas I, Barney O, Jaeger-Frank E (2006) Provenance collection support in the Kepler scientific workflow system. In: Proceedings of IPAW’06, pp 118–132

  39. Crawl D, Altintas I (2008) A provenance-based fault tolerance mechanism for scientific workflows. In: Proceedings of IPAW’08, pp 152–159

  40. Muniswamy-Reddy KK, Holland DA, Braun U, Seltzer MI (2006) Provenance-aware storage systems. In: USENIX annual technical conference, general track, pp 43–56

  41. Muniswamy-Reddy KK, Braun U, Holland DA et al (2009) Layering in provenance systems. In: USENIX annual technical conference

  42. Holland DA, Seltzer MI, Braun U, Muniswamy-Reddy KK (2008) PASSing the provenance challenge. Concurr Comput 20(5):531–540

    Article  Google Scholar 

  43. Widom J (2017) Trio: a system for integrated management of data, accuracy, and lineage. Technical Report 2004-40, Stanford InfoLab. http://ilpubs.stanford.edu:8090/658/. Visited Dec 2017

  44. Agrawal P, Benjelloun O, Sarma AD et al (2006) Trio: a system for data, uncertainty, and lineage. In: Proceedings of VLDB’06, pp 1151–1154

  45. Benjelloun O, Sarma AD, Hayworth C, Widom J (2017) An introduction to ULDBs and the trio system. Technical Report 2006-7, Stanford InfoLab. http://ilpubs.stanford.edu:8090/793/. Visited Dec 2017

  46. Mutsuzaki M, Theobald M et al (2007) Trio-one: layering uncertainty and lineage on a conventional DBMS. In: Proceedings of CIDR’07, pp 269–274

  47. Widom J (2008) Trio: a system for data, uncertainty, and lineage. In: Aggarwal CC (ed) Managing and mining uncertain data. Springer, Berlin

    Google Scholar 

  48. Agrawal P, Ikeda R, Park H, Widom J (2017) Trio-ER: the trio system as a workbench for entity-resolution. Technical report (march 2009), Stanford University. http://ilpubs.stanford.edu:8090/912/. Visited Dec 2017

  49. Simmhan YL, Plale B, Gannon D, Marru S (2006) Performance evaluation of the karma provenance framework for scientific workflows. In: Proceedings of IPAW’06, pp 222–236

  50. Simmhan YL, Plale B, Gannon D (2006) A framework for collecting provenance in data-centric scientific workflows. In: Proceedings of the international conference on web services (ICWS’06). IEEE, pp 427–436

  51. Huq MR, Wombacher A, Apers PMG (2011) Inferring fine-grained data provenance in stream data processing: reduced storage cost, high accuracy. In: Proceedings of DEXA’11, pp 118–127

  52. Simmhan YL, Plale B, Gannon D (2010) Karma2: provenance management for data-driven workflows. In: Web services research for emerging applications: discoveries and trends: discoveries and trends, 317

  53. Foster I, Vöckler J, Wilde M, Zhao Y (2002) Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of SSDBM’02. IEEE, pp 37–46

  54. Clifford B, Foster I, Voeckler J-S et al (2008) Tracking provenance in a virtual data grid. Concurr Comput 20(5):565–575

    Article  Google Scholar 

  55. Zhao Y, Wilde M, Foster I (2006) Applying the virtual data provenance model. In: Proceedings of IPAW’06, Volume 4145 of LNCS. Springer, Berlin, pp 148–161

  56. Biton O, Cohen-Boulakia S, Davidson SB (2007) Zoom*UserViews: querying relevant provenance in workflow systems. In: Proceedings of VLDB’07. VLDB Endowment, pp 1366–1369

  57. Cohen-Boulakia S, Biton O, Cohen S, Davidson S (2008) Addressing the provenance challenge using ZOOM. Concurr Comput 20(5):497–506

    Article  Google Scholar 

  58. Biton O, Cohen-Boulakia S, Davidson SB, Hara CS (2008) Querying and managing provenance through user views in scientific workflows. In: Proceedings of the IEEE 24th international conference on data engineering (ICDE’08). IEEE, pp 1072–1081

  59. Cheney J, Perera R (2014) An analytical survey of provenance sanitization. In: Proceedings of IPAW’14, pp 113–126

  60. Cui Y, Widom J (2000) Lineage tracing in data warehouses. In: Proceedings of the 16th international conference on data engineering. IEEE, pp 367–378

  61. Cui Y, Widom J, Wiener JL (2000) Tracing the lineage of view data in a warehousing environment. ACM Trans Database Syst (TODS) 25(2):179–227

    Article  Google Scholar 

  62. Cui Y, Widom J (2000) Practical lineage tracing in data warehouses. In: Proceedings of the 16th international conference on data engineering (ICDE’00). IEEE, pp 367–378

  63. Wiener J, Gupta H, Labio H et al (1995) A system prototype for warehouse view maintenance. In: Proceedings of MOD’95, pp 26–33

  64. Zhao Y, Hategan M et al (2007) Swift: fast, reliable, loosely coupled parallel computation. In: IEEE international conference on services computing—workshops (SCW’07), pp 199–206

  65. Gadelha LMR Jr, Clifford B, Mattoso M et al (2011) Provenance management in Swift. Future Gener Comput Syst 27(6):775–780

    Article  Google Scholar 

  66. Marinho A, de Oliveira D, Ogasawara E et al (2017) Deriving scientific workflows from algebraic experiment lines: a practical approach. Future Gener Comput Syst 68:111–127

    Article  Google Scholar 

  67. Wilde M, Hategan M, Wozniak JM et al (2011) Swift: a language for distributed parallel scripting. Parallel Comput 37(9):633–652

    Article  Google Scholar 

  68. Groth P, Miles S, Moreau L (2005) PReServ: provenance recording for services. UK e-Science All Hands Meeting

  69. Stonebraker M, Chen J, Nathan N et al (1993) Tioga: providing data management support for scientific visualization applications. In: Proceedings of VLDB’93, pp 25–38

  70. Woodruff A, Stonebraker M (1997) Supporting fine-grained data lineage in a database visualization environment. In: Proceedings of ICDE’97, pp 91–102

  71. Aiken A, Chen J, Stonebraker M, Woodruff A (1996) Tioga-2: a direct manipulation database visualization environment. In: Proceedings of the twelfth international conference on data engineering (ICDE’96), pp 208–217

  72. Deelman E, Singh G, Mei-Hui S et al (2005) Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci Program 13(3):219–237

    Google Scholar 

  73. Kim J, Deelman E, Gil Y et al (2008) Provenance trails in the Wings/Pegasus system. Concurr Comput 20(5):587–597

    Article  Google Scholar 

  74. Deelman E, Blythe J, Gil Y et al (2004) Pegasus: mapping scientific workflows onto the grid. In: Proceedings of the 2nd European across grids conference (EAGC’04). Springer, Berlin, pp 11–20

  75. Gil Y, Ratnakar V, Deelman E et al (2007) Wings for pegasus: creating large-scale scientific applications using semantic representations of computational workflows. In: Proceedings of AAAI’07, vol 22, p 1767

  76. Deelman E, Mehta G, Singh G et al (2007) Pegasus: mapping large-scale workflows to distributed resources. In: Taylor IJ, Deelman E, Gannon DB, Shields M (eds) Workflows for e-Science. Springer, Berlin, pp 376–394

    Chapter  Google Scholar 

  77. Garijo D, Gil Y, Corcho O (2017) Abstract, link, publish, exploit: an end to end framework for workflow sharing. Future Gener Comput Syst 75:271–283

    Article  Google Scholar 

  78. Gil Y, Ratnakar V, Kim J et al (2011) Wings: intelligent workflow-based design of computational experiments. IEEE Intell Syst 26(1):62–72

    Article  Google Scholar 

  79. Buneman P, Chapman A, Cheney J (2006) Provenance management in curated databases. In: Proceedings of MOD’06, pp 539–550

  80. Gehani A, Tariq D (2012) SPADE: support for provenance auditing in distributed environments. In: Proceedings of the 13th international middleware conference, pp 101–120

  81. Gehani A, Kim M (2010) Mendel: efficiently verifying the lineage of data modified in multiple trust domains. In: Proceedings of the 19th ACM international symposium on high performance distributed computing (HPDC’10). ACM, pp 227–239

  82. Chan SC, Gehani A, Cheney J et al (2017) Expressiveness benchmarking for system-level provenance. In: Proceedings of TaPP’17. USENIX Association

  83. Stamatogiannakis M, Kazmi H, Sharif H et al (2016) Trade-offs in automatic provenance capture. In: Proceedings of IPAW’16, pp 29–41

  84. Gehani A, Kazmi H, Irshad H (2016) Scaling spade to big provenance. In: Proceedings of TaPP’16. USENIX Association, pp 26–33

  85. Ives ZG, Khandelwal N, Kapur A, Cakir M (2005) ORCHESTRA: rapid, collaborative sharing of dynamic data. In: Proceedings of CIDR’05, pp 107–118

  86. Green TJ, Karvounarakis G, Ives ZG, Tannen V (2007) Update exchange with mappings and provenance. In: Proceedings of VLDB’07, pp 675–686

  87. Green TJ, Karvounarakis G, Taylor NE et al (2007) ORCHESTRA: facilitating collaborative data sharing. In: Proceedings of MOD’07, pp 1131–1133

  88. Green TJ, Tannen V (2017) The semiring framework for database provenance. In: Proceedings of PODS’17. ACM, pp 93–99

  89. Ives ZG, Green TJ, Karvounarakis G et al (2008) The orchestra collaborative data sharing system. ACM SIGMOD Rec 37(3):26–32

    Article  Google Scholar 

  90. Glavic B, Miller RJ, Alonso G (2013) Using SQL for efficient generation and querying of provenance information. In: In search of elegance in the theory and practice of computation: a Festschrift in honour of Peter Buneman, pp 291–320

  91. Glavic B, Alonso G (2009) Perm: processing provenance and data on the same data model through query rewriting. In: Proceedings of the 25th IEEE international conference on data engineering (ICDE’09), pp 174–185

  92. Glavic B, Alonso G (2009) Provenance for nested subqueries. In: Proceedings of the 12th international conference on extending database technology (EDBT’09), pp 982–993

  93. Glavic B, Alonso G (2009) The perm provenance management system in action. In: Proceedings of MOD’09 (demonstration track), pp 1055–1058

  94. Arab B, Gawlick D, Radhakrishnan V et al (2014) A generic provenance middleware for database queries, updates, and transactions. In: Proceedings of TaPP’14

  95. Niu X, Kapoor R, Glavic B et al (2015) Interoperability for provenance-aware databases using PROV and JSON. In: Proceedings of TaPP’15

  96. Arab B, Gawlick D, Krishnaswamy V et al (2017) Reenacting transactions to compute their proven: a system prototype for warehouse view maintenanceance. Technical Report IIT/CS-DB-2014-02, Illinois Institute of Technology (2014). http://cs.iit.edu/~dbgroup/pdfpubls/AD14.pdf. Visited Dec 2017

  97. Lee S, Tang Y, Köhler S et al (2015) An efficient implementation of game provenance in DBMS. Technical Report IIT/CS-DB-2015-02, Illinois Institute of Technology

  98. Niu X, Kapoor R, Glavic B (2015) Heuristic and cost-based optimization for provenance computation. In: Proceedings of TaPP’15

  99. Frew J, Slaughter P (2008) Es3: a demonstration of transparent provenance for scientific computation. In: Proceedings of IPAW’08, volume 5272 of LNCS. Springer, Berlin, pp 200–207

  100. Frew J, Metzger D, Slaughter P (2008) Automatic capture and reconstruction of computational provenance. Concurr Comput 20(5):485–496

    Article  Google Scholar 

  101. Bowers S, McPhillips TM, Ludäscher B (2008) Provenance in collection-oriented scientific workflows. Concurr Comput 20(5):519–529

    Article  Google Scholar 

  102. Bhagwat D, Chiticariu L, Tan WC, Vijayvargiya G (2005) An annotation management system for relational databases. VLDB J 14(4):373–396

    Article  Google Scholar 

  103. Chiticariu L, Tan WC, Vijayvargiya G (2005) DBNotes: a post-IT system for relational databases based on provenance. In: Proceedings of MOD’05. ACM, pp 942–944

  104. Amsterdamer Y, Davidson SB, Deutch D et al (2011) Putting lipstick on pig: enabling database-style workflow provenance. PVLDB 5(4):346–357

    Google Scholar 

  105. Barga RS, Digiampietri LA (2008) Automatic capture and efficient storage of e-Science experiment provenance. Concurr Comput 20(5):419–429

    Article  Google Scholar 

  106. Guo PJ, Seltzer M (2012) BURRITO: wrapping your lab notebook in computational infrastructure. In: Proceedings of TaPP’12

  107. Guo PJ (2012) Software tools to facilitate research programming. PhD Thesis, Stanford University

  108. Macko P, Seltzer M (2011) Provenance map orbiter: interactive exploration of large provenance graphs. In: Proceedings of TaPP’11

  109. Chapman A, Blaustein BT, Seligman L, Allen MD (2011) Plus: a provenance manager for integrated information. In: Proceedings of the IEEE international conference on information reuse and integration (IRI’11). IEEE, pp 269–275

  110. Chapman A, Allen MD, Blaustein B et al (2010) Plus: provenance for life, the universe and stuff. In: VLDB’10, VLDB endowment, pp 13–17

  111. Blaustein B, Seligman L, Morse M et al (2008) Plus: synthesizing privacy, lineage, uncertainty and security. In: Proceedings of the 24th international conference on data engineering workshop (ICDEW’08), pp 242–245

  112. Ikeda R, Park H, Widom J (2011) Provenance for generalized map and reduce workflows. In: Proceedings of the fifth biennial conference on innovative data systems (CIDR’11), pp 273–283

  113. Park H, Ikeda R, Widom J (2011) Ramp: a system for capturing and tracing provenance in mapreduce workflows. PVLDB 4(12):1351–1354

    Google Scholar 

  114. Rodriguez-Priego E, García-Izquierdo FJ, Rubio ÁL (2013) References-enriched concept map: a tool for collecting and comparing disparate definitions appearing in multiple references. J Inf Sci 39(6):789–804

    Article  Google Scholar 

  115. Saénz-Adán C, García-Izquierdo FJ, Rubio ÁL, de Cabezón IES, Rodríguez-Priego E, Díaz O (2015) A tool for management of knowledge dispersed throughout multiple references. In: Proceedings of the 10th international conference on software paradigm trends (ICSOFT-PT’15), pp 79–86

  116. Holland DA, Braun U, Maclean D et al (2008) Choosing a data model and query language for provenance. In: Proceedings of IPAW’08, pp 98–115

  117. Provenance Challenge Series. http://twiki.ipaw.info/bin/view/Challenge/. Visited Dec 2017

  118. Moreau LL, Clifford B, Freire J et al (2011) The open provenance model core specification (v1.1). Future Gener Comput Syst 27(6):743–756

    Article  Google Scholar 

  119. PROV-Overview. An overview of the PROV family of documents. https://www.w3.org/TR/prov-overview/. Visited Dec 2017

  120. Bechhofer S, De Roure D, Gamble M, Goble C, Buchan I (2010) Research objects Towards exchange and reuse of digital knowledge. The Future of the Web for collaborative science. Raleigh: nature Precedings

  121. Zhao J, Miles A, Klyne G, Shotton DM (2009) Linked data and provenance in biological data webs. Brief Bioinform 10(2):139–152

    Article  Google Scholar 

  122. Hartig O, Zhao J (2010) Publishing and consuming provenance metadata on the web of linked data. In: Proceedings of IPAW’10, pp 78–90

  123. Glavic B (2012) Big data provenance: challenges and implications for benchmarking. In: First workshop of specifying big data benchmarks (WBDB’12), pp 72–80

  124. Wang J, Crawl D, Purawat S et al (2015) Big data provenance: challenges, state of the art and opportunities. In: Proceedings of the IEEE international conference on big data, big data, pp 2509–2516

  125. Appelbaum D (2016) Securing big data provenance for auditors: the big data provenance black box as reliable evidence. J Emerg Technol Account 13(1):17–36

    Article  Google Scholar 

  126. Akoush S, Sohan R, Hopper A (2013) Hadoopprov: towards provenance as a first class citizen in mapreduce. In: Proceedings of TaPP’13

  127. Green TJ, Karvounarakis G, Tannen V (2007) Provenance semirings. In: Proceedings of PODS’07. ACM, pp 31–40

  128. Karvounarakis G, Green TJ (2012) Semiring-annotated data: queries and provenance? ACM. SIGMOD Rec 41(3):5–14

    Article  Google Scholar 

  129. Anam S, Kang BH, Kim YS, Liu Q (2015) Linked data provenance: state of the art and challenges. In: Proceedings of the Australasian web conference (AWC’15), volume 166 of CRPIT. Australian Computer Society, pp 19–28

  130. Graphviz (2017) Graph Visualization Software. www.graphviz.org. Visited Dec 2017

  131. W3C PROV Implementation Survey. https://www.w3.org/2002/09/wbs/46974/prov-implementation-survey/results. Visited Dec 2017

  132. Klarman S, Schlobach S, Serafini L (2012) Formal verification of data provenance records. In: Proceedings of the 11th international semantic web conference (ISWC’12) Part I, pp 215–230

  133. Moreau L, Huynh TD, Michaelides D (2014) An online validator for provenance: algorithmic design, testing, and API. In: Proceedings of the FASE’14. Springer, Berlin, pp 291–305

  134. Lakhani H, Tahir R, Aqil A et al (2013) Optimized rollback and re-computation. In: Proceedings of HICSS’13, pp 4930–4937

  135. Agrawal D, Bernstein P, Bertino E et al (2012) Challenges and opportunities with big data. A community white paper developed by leading researchers across the united states. Computing Research Association

  136. Hartig O (2009) Provenance information in the web of data. In: Proceedings of the linked data on the web (LDOW)

  137. Boulakia SC, Belhajjame K et al (2017) Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities. Future Gener Comput Syst 75:284–298

    Article  Google Scholar 

  138. McDaniel P, Butler KRB, McLaughlin SE et al (2010) Towards a secure and efficient system for end-to-end provenance. In: Proceedings of TaPP’10

  139. Jiawei H (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco. ISBN 1558609016

    MATH  Google Scholar 

  140. van der Aalst WMP (2011) Process mining: discovery, conformance and enhancement of business processes, 1st edn. Springer, Berlin. ISBN 3642193447, 9783642193446

Download references

Acknowledgements

This work has been partially supported by the Spanish Ministerio de Economía y Competitividad (Project MTM2014-54151-P and Project EDU2016-79838-P) and by the University of La Rioja (Grant FPI-UR-2015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beatriz Pérez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pérez, B., Rubio, J. & Sáenz-Adán, C. A systematic review of provenance systems. Knowl Inf Syst 57, 495–543 (2018). https://doi.org/10.1007/s10115-018-1164-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1164-3

Keywords

Navigation