Skip to main content

Trusted Provenance of Collaborative, Adaptive, Process-Based Data Processing Pipelines

  • Conference paper
  • First Online:
Enterprise Design, Operations, and Computing. EDOC 2023 Workshops (EDOC 2023)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 498))

  • 70 Accesses

Abstract

The abundance of data nowadays provides a lot of opportunities to gain insights in many domains. Data processing pipelines are one of the ways used to automate different data processing approaches and are widely used by both industry and academia. In many cases data and processing are available in distributed environments and the workflow technology is a suitable one to deal with the automation of data processing pipelines and support at the same time collaborative, trial-and-error experimentation in term of pipeline architecture for different application and scientific domains. In addition to the need for flexibility during the execution of the pipelines, there is a lack of trust in such collaborative settings where interactions cross organisational boundaries. Capturing provenance information related to the pipeline execution and the processed data is common and certainly a first step towards enabling trusted collaborations. However, current solutions do not capture change of any aspect of the processing pipelines themselves or changes in the data used, and thus do not allow for provenance of change. Therefore, the objective of this work is to investigate how provenance of workflow or data change during execution can be enabled. As a first step we have developed a preliminary architecture of a service – the Provenance Holder – which enables provenance of collaborative, adaptive data processing pipelines in a trusted manner. In our future work, we will focus on the concepts necessary to enable trusted provenance of change, as well as on the detailed service design, realization and evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    “The provenance of digital objects represents their origins.”\(^2\).

  2. 2.

    https://www.w3.org/TR/prov-primer/.

  3. 3.

    We use the term method for disambiguation purposes only.

References

  1. Mesirov, J.P.: Accessible reproducible research. Science 27, 415–416 (2010)

    Article  Google Scholar 

  2. Wilkinson, M., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016)

    Article  Google Scholar 

  3. Atkinson, M., et al.: Scientific workflows: past, present and future. Future Gener. Comput. Syst. 75, 216–227 (2017)

    Article  Google Scholar 

  4. Herschel, M., et al.: A survey on provenance - what for? what form? what from? Int. J. Very Large Data Bases (VLDB J.) 26, 881–906 (2017)

    Article  Google Scholar 

  5. Alper, P., et al.: Enhancing and abstracting scientific workflow provenance for data publishing. In: Proceedings of the Joint EDBT/ICDT Workshops (2013)

    Google Scholar 

  6. Freire, J., Chirigati, F.S.: Provenance and the different flavors of reproducibility. IEEE Data Eng. Bull. 41, 15 (2018)

    Google Scholar 

  7. Stage, L., Karastoyanova, D.: Provenance holder: bringing provenance, reproducibility and trust to flexible scientific workflows and choreographies. In: Di Francescomarino, C., Dijkman, R., Zdun, U. (eds.) BPM2019. LNBIP, vol. 362, pp. 664–675. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37453-2_53

    Chapter  Google Scholar 

  8. Sonntag, M., Karastoyanova, D.: Model-as-you-go: an approach for an advanced infrastructure for scientific workflows. J. Grid Comput. 11, 553–583 (2013)

    Article  Google Scholar 

  9. Fdhila, W., et al.: Dealing with change in process choreographies: design and implementation of propagation algorithms. Inf. Syst. 49, 1–24 (2015)

    Article  Google Scholar 

  10. Weske, M.: Business Process Management - Concepts, Languages, Architectures, 3rd edn. Springer, Heidelberg (2019)

    Book  Google Scholar 

  11. Leymann, F., Roller, D.: Production Workflow: Concepts and Techniques. Prentice Hall PTR, Hoboken (2000)

    Google Scholar 

  12. Karastoyanova, D., Stage, L.: Towards collaborative and reproducible scientific experiments on blockchain. In: Matulevičius, R., Dijkman, R. (eds.) CAiSE 2018. LNBIP, vol. 316, pp. 144–149. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92898-2_12

    Chapter  Google Scholar 

  13. Stage, L., Karastoyanova, D.: Trusted provenance of automated, collaborative and adaptive data processing pipelines (2023). https://doi.org/10.48550/arXiv.2310.11442. Accessed 26 Nov 2023

  14. Dijkstra, E.W.: On the role of scientific thought. In: Selected Writings on Computing: A Personal Perspective. Texts and Monographs in Computer Science. Springer, New York (1982). https://doi.org/10.1007/978-1-4612-5695-3_12

  15. Bontekoe, T., Karastoyanova, D., Turkmen, F.: Verifiable privacy-preserving computing (2023). https://doi.org/10.48550/arXiv.2309.08248. Accessed 13 Oct 2023

  16. Strauch, S., et al.: Migrating enterprise applications to the cloud: methodology and evaluation. Int. J. Big Data Intell. 5, 127–140 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ludwig Stage .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stage, L. (2024). Trusted Provenance of Collaborative, Adaptive, Process-Based Data Processing Pipelines. In: Sales, T.P., de Kinderen, S., Proper, H.A., Pufahl, L., Karastoyanova, D., van Sinderen, M. (eds) Enterprise Design, Operations, and Computing. EDOC 2023 Workshops . EDOC 2023. Lecture Notes in Business Information Processing, vol 498. Springer, Cham. https://doi.org/10.1007/978-3-031-54712-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54712-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54711-9

  • Online ISBN: 978-3-031-54712-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics