Skip to main content
Log in

Visionary: a framework for analysis and visualization of provenance data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Provenance is recognized as a central challenge to establish the reliability and provide security in computational systems. In scientific workflows, provenance is considered essential to support experiments’ reproducibility, interpretation of results, and problem diagnosis. We consider that these requirements can also be used in new application domains, such as software processes and IoT. However, for a better understanding and use of provenance data, efficient and user-friendly mechanisms are needed. Ontology, complex networks, and software visualization can help in this process by generating new data insights and strategic information for decision-making. This paper presents the Visionary framework, designed to assist in the understanding and use of provenance data through ontologies, complex network analysis, and software visualization techniques. The framework captures the provenance data and generates new information using ontologies and structural analysis of the provenance graph. The visualization presents and highlights inferences and results obtained with the data analysis. Visionary is an application domain-free framework adapted to any system that uses the PROV provenance model. Evaluations were carried out, and some evidence was found that the framework assists in the understanding and analysis of provenance data when decision-making is needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Provenance data principles are related to provenance main components, i.e., entity, activity, and agents as well as provenance types: prospective and retrospective. These concepts will be better explained in Sect. 2.

  2. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/.

  3. In computer science, an ontology is described as a formal and explicit specification of a shared conceptualization [33].

  4. A graph is directed when links have a specified direction.

  5. When the graph is connected by edges of different types.

  6. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/systematic-revisions/.

  7. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/systematic-revisions/.

  8. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/analyzes-based-on-graph-metrics/.

  9. The complete details of this processing can be accessed at http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/analyzes-based-on-graph-metrics/.

  10. These complete questions are detailed at http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/prov-process-evaluation-form/.

  11. https://drive.google.com/open?id=16vQvlkIQTxDkDgTtRuVRWXmUca3nV0oK.

  12. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/evaluation-roadmap/.

  13. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/term-of-consent/.

  14. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/characterization-questionnaire/.

  15. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/evaluation-roadmap/.

  16. https://drive.google.com/open?id=16vQvlkIQTxDkDgTtRuVRWXmUca3nV0oK.

  17. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/evaluation-questionnaire/.

  18. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/evaluation-roadmap/.

  19. http://www.ufjf.br/nenc/projetos/visionary-a-framework-for-analysis-and-visualization-of-provenance-data/evaluation-questionnaire/.

  20. https://github.com/pgcc/SPPV.

References

  1. Groth P, Moreau L (2013) Prov-overview. An overview of the prov family of documents. World Wide Web Consortium. http://www.w3.org/TR/2013/NOTE...-20130430/. Accessed 31 Aug 2021

  2. Acar UA, Ahmed A, Cheney J, Perera R (2012) A core calculus for provenance. POST 7215:410–429. https://doi.org/10.1007/978-3-642-28641-4_22

    Article  MATH  Google Scholar 

  3. Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-science. ACM SIGMOD Rec 34(3):31–36. https://doi.org/10.1145/1084805.1084812

    Article  Google Scholar 

  4. Muniswamy-Reddy K-K, Holland DA, Braun U, Seltzer MI (2006) Provenance-aware storage systems. In: USENIX annual technical conference, general track, pp 43–56

  5. Costa G, Werner C, Braga RM, Dalpra H, Stroele V, Araujo MA (2019) Deriving strategical information for software development processes using provenance data and ontology techniques. Int J Bus Process Integr Manag (Print). https://doi.org/10.1504/IJBPIM.2019.100924

    Article  Google Scholar 

  6. Muniswamy-Reddy K-K, Seltzer M (2010) Provenance as first class cloud data. ACM SIGOPS Oper Syst Rev 43(4):11–16. https://doi.org/10.1145/1713254.1713258

    Article  Google Scholar 

  7. Margo DW, Smogor R (2010) Using provenance to extract semantic file attributes. In: Proceedings of the 2nd conference on theory and practice of provenance (TAPP'10). USENIX Association, USA, p 7

  8. Cheney J, Chiticariu L, Tan W-C et al (2009) Provenance in databases: why, how, and where. Found Trends® Databases 1(4):379–474. https://doi.org/10.1561/1900000006

    Article  Google Scholar 

  9. Wang Q, Hassan WU, Li D, Jee K, Yu X, Zou K, Chen H (2020) You are what you do: hunting stealthy malware via data provenance analysis. In: Symposium on network and distributed system security (NDSS). https://doi.org/10.14722/ndss.2020.24167

  10. Sigwart M, Borkowski M, Peise M, Schulte S, Tai S (2020) A secure and extensible blockchain-based data provenance framework for the Internet of Things. Pers Ubiquit Comput. https://doi.org/10.1007/s00779-020-01417-z

    Article  Google Scholar 

  11. Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth P, Kwasnikowska N, Miles S, Missier P, Myers J et al (2011) The open provenance model core specification (v1.1). Future Gener Comput Syst 27(6):743–756. https://doi.org/10.1016/j.future.2010.07.005

    Article  Google Scholar 

  12. Buneman P, Khanna S, Tan WC (2001) Why and where: a characterization of data provenance. In: Springer. ICDT, 1, pp 316–330. https://doi.org/10.1007/3-540-44503-X_20

  13. Packer HS, Moreau L (2014) Sentence templating for explaining provenance. In: Ludäscher B, Plale B (eds) Provenance and annotation of data and processes. IPAW 2014. Lecture notes in computer science, vol 8628. Springer, Cham. https://doi.org/10.1007/978-3-319-16462-5_33

  14. Arshad B, Munir K, Mcclatchey R, Liaquat S (2015) Position paper: provenance data visualization for neuroimaging analysis. arXiv:1502.01556

  15. Hoekstra R, Groth P (2014) Prov-o-viz-understanding the role of activities in provenance. In: International provenance and annotation workshop. Springer, pp 215–220. https://doi.org/10.1007/978-3-319-16462-5_18

  16. Oliveira W, Ambrosio L, Braga R, Stroele V, David JMN, Campos F (2017) A framework for provenance analysis and visualization. Procedia Comput Sci 108:1592–1601. https://doi.org/10.1016/j.procs.2017.05.216

    Article  Google Scholar 

  17. Pérez B, Rubio J, Sáenz-Ádan C (2018) A systematic review of provenance systems. Knowl Inf Syst 57:495–543. https://doi.org/10.1007/s10115-018-1164-3

    Article  Google Scholar 

  18. Kohwalter T, Oliveira T, Freire J, Clua E, Murta L (2016) Prov viewer: a graph-based visualization tool for interactive exploration of provenance data. In: International provenance and annotation workshop. Springer, pp 71–82. https://doi.org/10.1007/978-3-319-40593-3_6

  19. Cheay Y-W, Plale B (2012) Provenance analysis: towards quality provenance. In: 2012 IEEE 8th international conference on E-science (e-Science). IEEE, pp 1–8. https://doi.org/10.1109/eScience.2012.6404480

  20. Dominguez E, Pérez B, Rubio J, Sáenz-Ádan C (2017) Developing provenance-aware query systems: an occurrence-centric approach. Knowl Inf Syst 50:661–688. https://doi.org/10.1007/s10115-016-0950-z

    Article  Google Scholar 

  21. Richardson DP, Moreau L (2016) Towards the domain agnostic generation of natural language explanations from provenance graphs for casual users. In: International provenance and annotation workshop. Springer, pp 95–106. https://doi.org/10.1007/978-3-319-40593-3_8

  22. Hevner AR, March ST, Jinsoo P, Ram S (2004) Design science in information systems research. MIS Q 28(1):75–105. https://doi.org/10.2307/25148625

    Article  Google Scholar 

  23. Moreau L, Kwasnikowska N, Bussche JV (2009) The foundations of the open provenance model. http://eprints.soton.ac.uk/id/eprint/267282. Accessed 31 Aug 2021

  24. Lim C, Lu S, Chebotko A, Fotouhi F (2010) Prospective and retrospective provenance collection in scientific workflow environments. In: 2010 IEEE international conference on services computing (SCC). IEEE, pp 449–456. https://doi.org/10.1109/SCC.2010.18

  25. Bowers S, Mcphillips T, Ludascher B, Cohen S, Davidson SB (2006) A model for user-oriented data provenance in pipelined scientific workflows. In: International provenance and annotation workshop. Springer, pp 133–147. https://doi.org/10.1007/11890850_15

  26. Buneman P, Chapman A, Cheney J, Vansummeren SA (2006) Provenance model for manually curated data. IPAW 6:162–170. https://doi.org/10.1007/11890850_17

    Article  Google Scholar 

  27. Cao B, Plale B, Subramanian G, Robertson E, Simmhan Y (2009) Provenance information model of karma version 3. In: 2009 world conference on services-I. IEEE, pp 348–351. https://doi.org/10.1109/SERVICES-I.2009.54

  28. Davidson SB, Freire J (2008) Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, pp 1345–1350. https://doi.org/10.1145/1376616.1376772

  29. Moreau L, Missier P (2013) Prov-dm: The prov data model., v. 3. https://www.w3.org/TR/prov-dm/. Accessed 31 Aug 2021

  30. Lebo T, Sahoo S, Mcguinness D, Belhajjame K, Cheney J, Corsar D, Garijo D, Soiland Reyes S, Zednik S, Zhao J (2013) Prov-O: the prov ontology. W3C recommendation, 30. https://www.w3.org/TR/2011/WD-prov-o-20111213/. Accessed 31 Aug 2021

  31. Harary F (1969) Graph theory. Addison, Reading

    Book  Google Scholar 

  32. Newman MEJ (2010) Networks: an introduction. Oxford University, Oxford (ISBN: 0199206651)

    Book  Google Scholar 

  33. Guarino N et al (1998) Formal ontology and information systems. Proc FOIS 98:81–97

    Google Scholar 

  34. Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A (2012) Experimentation in software engineering. Springer, Berlin

    Book  Google Scholar 

  35. Chen P, Plale B, Cheah YW, Ghoshal D, Jensen S, Luo Y (2012) Visualization of network data provenance. In: 2012 19th international conference on high-performance computing (HiPC). IEEE, pp 1–9. https://doi.org/10.1109/HiPC.2012.6507517

  36. Karsai L (2016) Clustering provenance. Ph.D. thesis, University of Sydney. https://doi.org/10.1145/2939502.2939508

  37. Ragan E, Endert A, Sanyal J, Chen J (2016) Characterizing provenance in visualization and data analysis: an organizational framework of provenance types and purposes. IEEE Trans Vis Comput Graph 22(1):31–40. https://doi.org/10.1109/TVCG.2015.2467551

    Article  Google Scholar 

  38. Stitz H, Gratzl S, Piringer H, Zichener T, Streit M (2018) KnowledgePearls: provenance-based visualization retrieval. IEEE Trans Vis Comput Graph (VAST ’18) 25(1):120–130. https://doi.org/10.1109/TVCG.2018.2865024

    Article  Google Scholar 

  39. Anand MK, Bowers S, Ludascher B (2010) Provenance browser: Displaying and querying scientific workflow provenance graphs. In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE, pp 1201–1204. https://doi.org/10.1109/ICDE.2010.5447741

  40. Borkin MA, Yeh CS, Boyd M, Macko P, Gajos KZ, Seltzer M, Pfister H (2013) Evaluation of filesystem provenance visualization tools. IEEE Trans Vis Comput Graph 19(12):2476–2485. https://doi.org/10.1109/TVCG.2013.155

    Article  Google Scholar 

  41. Kadivar N, Chen V, Dunsmuir D, Lee E, Qjan C, Dill J, Shaw C, Woodbury R (2009) Capturing and supporting the analysis process. In: IEEE symposium on visual analytics science and technology. VAST 2009. IEEE, pp 131–138. https://doi.org/10.1109/VAST.2009.5333020

  42. Chen YV, Qian ZC, Woodbury R, Dill J, Shaw CD (2014) Employing a parametric model for analytic provenance. ACM Trans Interact Intell Syst (TiiS) 4(1):6. https://doi.org/10.1145/2591510

    Article  Google Scholar 

  43. Rio ND, Silva PPD (2007) Probe-it! Visualization support for provenance. In: International symposium on visual computing. Springer, pp 732–741. https://doi.org/10.1007/978-3-540-76856-2_72

  44. Hunter J, Cheung K (2007) Provenance explorer-a graphical interface for constructing scientific publication packages from provenance trails. Int J Digit Libr 7(1):99–107. https://doi.org/10.1007/s00799-007-0018-5

    Article  Google Scholar 

  45. Khan S, Kanturska U, Waters T, Eaton J, Banares-Alcantara R, Chen M (2016) Ontology-assisted provenance visualization for supporting enterprise search of engineering and business files. Adv Eng Inform 30(2):244–257. https://doi.org/10.1016/j.aei.2016.04.003

    Article  Google Scholar 

  46. Stitz H, Luger S, Streit M, Gehlenborg N (2016) Avocado: visualization of workflow-derived data provenance for reproducible biomedical research. In: Computer graphics forum. Wiley Online Library, vol 35, no 3, pp 481–490. https://doi.org/10.1111/cgf.12924

  47. Macko P, Margo S (2011) Provenance map orbiter: interactive exploration of large provenance graphs. In: Proceedings of the 3rd USENIX workshop on the theory and practice of provenance (TaPP '11), June 20–21, Heraklion, Crete, Greece. USENIX Association, Berkeley, CA

  48. Callahan SP et al. (2006) VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD international conference on management of Data. ACM, New York, NY, USA, pp 745–747. https://doi.org/10.1145/1142473.1142574

  49. Altintas I et al. (2004) Kepler: an extensible system for design and execution of scientific workflows. In: 16th international conference on scientific and statistical database management. Proceedings, pp 423–424. https://doi.org/10.1109/SSDM.2004.1311241

  50. Hull D (2006) Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34(suppl 2):W729–W732. https://doi.org/10.1093/nar/gkl320

    Article  Google Scholar 

  51. Ceolin D, Groth P, Maccatrozzo V, Fokkink W, Hage WRV, Nottamkandath A (2016) Combining user reputation and provenance analysis for trust assessment. J Data Inf Qual (JDIQ) 7(1–2):6. https://doi.org/10.1145/2818382

    Article  Google Scholar 

  52. Mcgrath RE, Futrelle J (2008) Reasoning about provenance with owl and swrl rules. In: AAAI spring symposium: AI meets business rules and process management, pp 87–92

  53. Missier P, Belhajjame K (2012) A prov encoding for provenance analysis using deductive rules. In: IPAW. Springer, pp 67–81. https://doi.org/10.1007/978-3-642-34222-6_6

  54. Prat N, Madnick S (2008) Measuring data believability: a provenance approach. In: Proceedings of the 41st annual Hawaii international conference on system sciences. IEEE, pp 393–393. https://doi.org/10.1109/HICSS.2008.243

  55. Strubulis C, Tzitzikas Y, Doerr M, Flouris G (2012) Evolution of workflow provenance information in the presence of custom inference rules. In: 3rd intern. workshop on the role of semantic web in provenance management (SWPM'12), co-located with ESWC'12, Heraklion, Crete

  56. Cuevas-Vicenttin V et al (2016) ProvONE: a PROV extension data model for scientific workflow provenance. https://purl.dataone.org/provone-v1-dev. Accessed 31 Aug 2021

  57. Dalpra H (2016) PROV-process: provenance data applied to software development process. Master Thesis, Federal University of Juiz de Fora. http://www.ufjf.br/pgcc/files/2014/06/Humberto-Dalpra.pdf (in Portuguese). Accessed 31 Aug 2021

  58. Sirqueira TF, Braga R, Araujo MA, David JM, Campos F, Stroele V (2017) An approach to configuration management of scientific workflows. Int J Web Portals (IJWP) 9(2):20–46. https://doi.org/10.4018/IJWP.2017070102

    Article  Google Scholar 

  59. Sirin E, Parsia B, Cuenca Grau B, Kalynpur A, Kartz Y (2007) Pellet: a practical OWL-DL reasoner. Web Semant 5(2):51–53. https://doi.org/10.1016/j.websem.2007.03.004

    Article  Google Scholar 

  60. Dalpra H, Castro G, Ferrenzini T, Braga R, Werner C, David JMN, Campos F (2015) Using ontology and data provenance to improve software processes. In: ONTOBRAS, 2015, São Paulo. Proceedings of Ontobras

  61. Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M (2004) SWRL: a semantic web rule language combining OWL and RuleML. https://www.w3.org/Submission/SWRL/. Accessed 10 May 2018

  62. Ebden M, Huynh T, Moreau L, Ramchurn S, Roberts S (2012) Network analysis on provenance graphs from a crowdsourcing application. In: Provenance and annotation of data and processes. Springer, pp 168–182. https://doi.org/10.1007/978-3-642-34222-6_13

  63. Huynh TD, Ebden M, Venanzi M, Ramchurn SD, Roberts S, Moreau L (2013) Interpretation of crowdsourced activities using provenance network analysis. In: First AAAI conference on human computation and crowdsourcing. http://eprints.soton.ac.uk/id/eprint/357199. Accessed 31 Aug 2021

  64. OMG (2011) BPM Notation (bpmn) version 2.0. https://www.omg.org/spec/BPMN/2.0/About-BPMN/. Accessed 31 Aug 2021

  65. Basili V, Caldiera G, Rombach D (1994) GQM paradigm. Computer encyclopedia of software engineering. Wiley

  66. Schaber K (1997) SCRUM development process. In: Sutherland J, Casanave C, Miller J, Patel P, Hollowell G (eds) Business object design and implementation. Springer, London. https://doi.org/10.1007/978-1-4471-0947-1_11

    Chapter  Google Scholar 

  67. Classe T, Braga R, David JMN, Campos F, Arbex W (2017) A distributed infrastructure to support scientific experiments. J Grid Comput 1:1–26. https://doi.org/10.1007/s10723-017-9401-7

    Article  Google Scholar 

  68. Lethbridge TC, Sim SE, Singer J (2005) Studying software engineers: data collection techniques for software field studies. Empir Softw Eng 10:311–341. https://doi.org/10.1007/s10664-005-1290-x

    Article  Google Scholar 

  69. Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1

    Article  Google Scholar 

  70. Runeson P, Host M, Rainer A, Regnell B (2012) Case study research in software engineering: Guidelines and examples. Wiley. ISBN: 978-1-118-10435-4

Download references

Acknowledgements

We would like to thank the people who participated in the evaluation.

Funding

This work was partially funded by UFJF/Brazil, CAPES/Brazil, CNPq/Brazil (Grant: 311595/2019-7), and FAPEMIG/Brazil (Grant: APQ-02685-17) and (Grant: APQ-02194-18).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Regina Braga.

Ethics declarations

Conflict of interest

All authors: none.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Oliveira, W., Braga, R., David, J.M.N. et al. Visionary: a framework for analysis and visualization of provenance data. Knowl Inf Syst 64, 381–413 (2022). https://doi.org/10.1007/s10115-021-01645-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01645-6

Keywords

Navigation