Skip to main content
Log in

Provenance-based analysis of data-centric processes

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We consider in this paper static analysis of the possible executions of data-dependent applications, namely applications whose control flow is guided by a finite-state machine, as well as by the state of an underlying database. We note that previous work in this context has not addressed two important features of such analysis, namely analysis under hypothetical scenarios, such as changes to the application’s state machine and/or to the underlying database, and the consideration of meta-data, such as cost or access privileges. Observing that semiring-based provenance has been proven highly effective in supporting these two features for database queries, we develop in this paper a semiring-based provenance framework for the analysis of data-dependent processes, accounting for hypothetical reasoning and meta-data. The development addresses two interacting new challenges: (1) combining provenance annotations for both information that resides in the database and information about external inputs (e.g., user choices) and (2) finitely capturing infinitely many process executions. We have implemented our framework as part of the PROPOLIS system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The semantics of LTL is more commonly defined with respect to infinite executions; however, from a provenance perspective, we will only be interested in finite prefixes, as is also the case in [23] that works directly with traces.

  2. Where user effort is some quantification associated with different actions such as following a link and filling in a text box.

  3. We follow common practice of analyzing data complexity, so \(\phi \), and the guarding queries are considered of constant size.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading, MA (1995)

    MATH  Google Scholar 

  2. Abiteboul, S., Vianu, V., Fordham, B., Yesha, Y.: Relational transducers for electronic commerce. In: PODS, ACM, Seattle, 1–3 June 1998

  3. Ailamaki, A., Ioannidis, Y. E., Livny, M.: Scientific workflow management by database management. In: SSDBM, IEEE, Capri, Italy, 1–3 July 1998

  4. Akroun, L., Benatallah, B., Nourine, L., Toumani, F.: Decidability and complexity of simulation preorder for data-centric web services. In: ICSOC, pp. 535–542, Springer, Paris, 3–6 Nov 2014

  5. Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. PVLDB 5(4), 346–357 (2011)

    Google Scholar 

  6. Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for aggregate queries. In: Proceedings of PODS, ACM, Athens, 12–16 June 2011

  7. Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17, 243–264 (2008)

  8. http://www.bpmn.org/

  9. Brzozowski, Janusz A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  10. Buneman, P., Cheney, J., Vansummeren, S.: On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst., 33(4), (2008). doi:10.1145/1412331.1412340

  11. Buneman, P., Khanna, S., Tan, W.C.: Why and where: a characterization of data provenance. In: ICDT, Springer, London, 4–6 Jan 2001

  12. Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found Trends Databases 1(4), 379–474 (2009)

    Article  Google Scholar 

  13. Cohn, D., Hull, R.: Business artifacts: a data-centric approach to modeling business operations and processes. IEEE Data Eng. Bull., 32(3), 3–9 (2009)

  14. Davidson, S. B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD, ACM, Vancouver, 10–12 June 2008

  15. Daws, C: Symbolic and parametric model checking of discrete-time markov chains. In: Theoretical Aspects of Computing-ICTAC 2004, pp. 280–294. Springer, Berlin (2005)

  16. Deutch, D., Ives, Z. G., Milo, T., Caravan, V. Tannen: Provisioning for what-if analysis. In: CIDR, Asilomar, 6–9 Jan 2006. http://www.cidrdb.org/

  17. Deutch, D., Moskovitch, Y., Tannen, V.: Propolis: provisioned analysis of data-centric processes (demo). In: VLDB (2013)

  18. Deutsch, A., Sui, L., Vianu, V., Zhou, D.: A system for specification and verification of interactive, data-driven web applications. In: SIGMOD Conference, ACM, Chicago, 27–29 June 2006

  19. Fink, Robert, Han, Larisa, Olteanu, Dan: Aggregation in probabilistic databases via knowledge compilation. PVLDB, 5(5), 490–501 (2012)

  20. Foster, I., Vockler, J., Wilde, M., Zhao, A.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: SSDBM (2002)

  21. Foster, J.N., Green, T.J., Tannen, V.: Annotated xml: queries and provenance. In: PODS, pp. 271–280, ACM, Vancouver, 9–11 June 2008

  22. Fu, X., Bultan, T., Su, J.: Wsat: a tool for formal analysis of web services. In: CAV, Springer, Boston, 13–17 July 2004

  23. Giannakopoulou, D., Havelund, K.: Automata-based verification of temporal properties on running programs. In: ASE, pp. 412–416, IEEE, Coronado Island, San Diego, 26–29 Nov 2001

  24. Gillmann, M., Mindermann, R., Weikum, G.: Benchmarking and configuration of workflow management systems. In: CoopIS (2000)

  25. Gondran, Michel, Minoux, Michel: Graphs, Dioids and Semirings: New Models and Algorithms. Springer, Berlin (2008)

    Google Scholar 

  26. Green, T. J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, ACM, Beijing, 11–13 June 2007

  27. Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Provenance in orchestra. IEEE Data Eng. Bull. 33(3), 9–16 (2010)

    Google Scholar 

  28. Gruber, H. Holzer, M.: Finite automata, digraph connectivity, and regular expression size. In: ICALP, Springer, Reykjavik, 7–11 July 2008

  29. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, 729–732 (2006)

  30. Hune, T., Romijn, J., Stoelinga, M., Vaandrager, F.: Linear Parametric Model Checking of Timed Automata. Springer, Berlin (2001)

  31. Kostylev, E.V., Buneman, P.: Combining dependent annotations for relational algebra. In: ICDT, pp. 196–207, ACM, Berlin, 26–29 Mar 2012

  32. Manna, Z., Pnueli, A.: The temporal logic of reactive and concurrent systems-specification. Springer, Berlin (1992)

    Book  Google Scholar 

  33. Meliou, A., Suciu, D.: Tiresias: the database oracle for how-to queries. In: SIGMOD ACM, Scottsdale, 20–24 May 2012

  34. Meliou, A., Gatterbauer, W., Suciu, D.: Reverse data management. PVLDB, 4(12), 1490–1493 (2011)

  35. Missier, P., Paton, N., Belhajjame, K.: Fine-grained and efficient lineage querying of collection-based workflow provenance. In: EDBT (2010)

  36. http://www.myexperiment.org/

  37. Pnueli, A.: Applications of Temporal Logic to the Specification and Verification of Reactive Systems: A Survey of Current Trends. Springer, Berlin (1986)

    Google Scholar 

  38. Simhan, Y.L., Plale, B., Gammon, D.: Karma2: provenance management for data-driven workflows. Int. J. Web Service Res., 5(2), 1–22 (2008)

  39. Ullman, J.D.: Principles of Database and Knowledge-Base Systems. Computer Science Press, Rockville, MD (1989)

    Google Scholar 

  40. Prov-overview, w3c working group note. http://www.w3.org/TR/prov-overview/, (2013)

  41. Constraints of the prov data model, w3c working group note. http://www.w3.org/TR/2013/REC-prov-constraints-20130430/, (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Deutch.

Additional information

This research was partially supported by the Israeli Ministry of Science, the Israeli Science Foundation, the National Science Foundation (NSF IIS 1217798), the US-Israel Binational Science Foundation, the Broadcom Foundation and Tel Aviv University Authentication Initiative.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deutch, D., Moskovitch, Y. & Tannen, V. Provenance-based analysis of data-centric processes. The VLDB Journal 24, 583–607 (2015). https://doi.org/10.1007/s00778-015-0390-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-015-0390-5

Keywords

Navigation