Abstract
We consider in this paper static analysis of the possible executions of data-dependent applications, namely applications whose control flow is guided by a finite-state machine, as well as by the state of an underlying database. We note that previous work in this context has not addressed two important features of such analysis, namely analysis under hypothetical scenarios, such as changes to the application’s state machine and/or to the underlying database, and the consideration of meta-data, such as cost or access privileges. Observing that semiring-based provenance has been proven highly effective in supporting these two features for database queries, we develop in this paper a semiring-based provenance framework for the analysis of data-dependent processes, accounting for hypothetical reasoning and meta-data. The development addresses two interacting new challenges: (1) combining provenance annotations for both information that resides in the database and information about external inputs (e.g., user choices) and (2) finitely capturing infinitely many process executions. We have implemented our framework as part of the PROPOLIS system.
Similar content being viewed by others
Notes
The semantics of LTL is more commonly defined with respect to infinite executions; however, from a provenance perspective, we will only be interested in finite prefixes, as is also the case in [23] that works directly with traces.
Where user effort is some quantification associated with different actions such as following a link and filling in a text box.
We follow common practice of analyzing data complexity, so \(\phi \), and the guarding queries are considered of constant size.
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading, MA (1995)
Abiteboul, S., Vianu, V., Fordham, B., Yesha, Y.: Relational transducers for electronic commerce. In: PODS, ACM, Seattle, 1–3 June 1998
Ailamaki, A., Ioannidis, Y. E., Livny, M.: Scientific workflow management by database management. In: SSDBM, IEEE, Capri, Italy, 1–3 July 1998
Akroun, L., Benatallah, B., Nourine, L., Toumani, F.: Decidability and complexity of simulation preorder for data-centric web services. In: ICSOC, pp. 535–542, Springer, Paris, 3–6 Nov 2014
Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. PVLDB 5(4), 346–357 (2011)
Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for aggregate queries. In: Proceedings of PODS, ACM, Athens, 12–16 June 2011
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17, 243–264 (2008)
Brzozowski, Janusz A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (1964)
Buneman, P., Cheney, J., Vansummeren, S.: On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst., 33(4), (2008). doi:10.1145/1412331.1412340
Buneman, P., Khanna, S., Tan, W.C.: Why and where: a characterization of data provenance. In: ICDT, Springer, London, 4–6 Jan 2001
Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: why, how, and where. Found Trends Databases 1(4), 379–474 (2009)
Cohn, D., Hull, R.: Business artifacts: a data-centric approach to modeling business operations and processes. IEEE Data Eng. Bull., 32(3), 3–9 (2009)
Davidson, S. B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD, ACM, Vancouver, 10–12 June 2008
Daws, C: Symbolic and parametric model checking of discrete-time markov chains. In: Theoretical Aspects of Computing-ICTAC 2004, pp. 280–294. Springer, Berlin (2005)
Deutch, D., Ives, Z. G., Milo, T., Caravan, V. Tannen: Provisioning for what-if analysis. In: CIDR, Asilomar, 6–9 Jan 2006. http://www.cidrdb.org/
Deutch, D., Moskovitch, Y., Tannen, V.: Propolis: provisioned analysis of data-centric processes (demo). In: VLDB (2013)
Deutsch, A., Sui, L., Vianu, V., Zhou, D.: A system for specification and verification of interactive, data-driven web applications. In: SIGMOD Conference, ACM, Chicago, 27–29 June 2006
Fink, Robert, Han, Larisa, Olteanu, Dan: Aggregation in probabilistic databases via knowledge compilation. PVLDB, 5(5), 490–501 (2012)
Foster, I., Vockler, J., Wilde, M., Zhao, A.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: SSDBM (2002)
Foster, J.N., Green, T.J., Tannen, V.: Annotated xml: queries and provenance. In: PODS, pp. 271–280, ACM, Vancouver, 9–11 June 2008
Fu, X., Bultan, T., Su, J.: Wsat: a tool for formal analysis of web services. In: CAV, Springer, Boston, 13–17 July 2004
Giannakopoulou, D., Havelund, K.: Automata-based verification of temporal properties on running programs. In: ASE, pp. 412–416, IEEE, Coronado Island, San Diego, 26–29 Nov 2001
Gillmann, M., Mindermann, R., Weikum, G.: Benchmarking and configuration of workflow management systems. In: CoopIS (2000)
Gondran, Michel, Minoux, Michel: Graphs, Dioids and Semirings: New Models and Algorithms. Springer, Berlin (2008)
Green, T. J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, ACM, Beijing, 11–13 June 2007
Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Provenance in orchestra. IEEE Data Eng. Bull. 33(3), 9–16 (2010)
Gruber, H. Holzer, M.: Finite automata, digraph connectivity, and regular expression size. In: ICALP, Springer, Reykjavik, 7–11 July 2008
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 34, 729–732 (2006)
Hune, T., Romijn, J., Stoelinga, M., Vaandrager, F.: Linear Parametric Model Checking of Timed Automata. Springer, Berlin (2001)
Kostylev, E.V., Buneman, P.: Combining dependent annotations for relational algebra. In: ICDT, pp. 196–207, ACM, Berlin, 26–29 Mar 2012
Manna, Z., Pnueli, A.: The temporal logic of reactive and concurrent systems-specification. Springer, Berlin (1992)
Meliou, A., Suciu, D.: Tiresias: the database oracle for how-to queries. In: SIGMOD ACM, Scottsdale, 20–24 May 2012
Meliou, A., Gatterbauer, W., Suciu, D.: Reverse data management. PVLDB, 4(12), 1490–1493 (2011)
Missier, P., Paton, N., Belhajjame, K.: Fine-grained and efficient lineage querying of collection-based workflow provenance. In: EDBT (2010)
Pnueli, A.: Applications of Temporal Logic to the Specification and Verification of Reactive Systems: A Survey of Current Trends. Springer, Berlin (1986)
Simhan, Y.L., Plale, B., Gammon, D.: Karma2: provenance management for data-driven workflows. Int. J. Web Service Res., 5(2), 1–22 (2008)
Ullman, J.D.: Principles of Database and Knowledge-Base Systems. Computer Science Press, Rockville, MD (1989)
Prov-overview, w3c working group note. http://www.w3.org/TR/prov-overview/, (2013)
Constraints of the prov data model, w3c working group note. http://www.w3.org/TR/2013/REC-prov-constraints-20130430/, (2014)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was partially supported by the Israeli Ministry of Science, the Israeli Science Foundation, the National Science Foundation (NSF IIS 1217798), the US-Israel Binational Science Foundation, the Broadcom Foundation and Tel Aviv University Authentication Initiative.
Rights and permissions
About this article
Cite this article
Deutch, D., Moskovitch, Y. & Tannen, V. Provenance-based analysis of data-centric processes. The VLDB Journal 24, 583–607 (2015). https://doi.org/10.1007/s00778-015-0390-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-015-0390-5