Skip to main content

Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5566))

Abstract

Existing approaches for representing the provenance of scientific workflow runs largely ignore computation models that work over structured data, including XML. Unlike models based on transformation semantics, these computation models often employ update semantics, in which only a portion of an incoming XML stream is modified by each workflow step. Applying conventional provenance approaches to such models results in provenance information that is either too coarse (e.g., stating that one version of an XML document depends entirely on a prior version) or potentially incorrect (e.g., stating that each element of an XML document depends on every element in a prior version). We describe a generic provenance model that naturally represents workflow runs involving processes that work over nested data collections and that employ update semantics. Moreover, we extend current query approaches to support our model, enabling queries to be posed not only over data lineage relationships, but also over versions of nested data structures produced during a workflow run. We show how hybrid queries can be expressed against our model using high-level query constructs and implemented efficiently over relational provenance storage schemes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Quass, D., McHugh, J., Widom, J., Wiener, J.L.: The Lorel Query Language for Semistructured Data. Intl. J. on Digital Libraries (1997)

    Google Scholar 

  2. Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Anand, M.K., Bowers, S., McPhillips, T., Ludäscher, B.: Efficient Provenance Storage Over Nested Data Collections. In: EDBT (2009)

    Google Scholar 

  4. Buneman, P., Suciu, D.: IEEE Data Engineering Bulletin. Special Issue on Data Provenance 30(4) (2007)

    Google Scholar 

  5. Bowers, S., McPhillips, T., Riddle, S., Anand, M., Ludäscher, B.: Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 70–77. Springer, Heidelberg (2008)

    Google Scholar 

  6. Bowers, S., McPhillips, T., Ludäscher, B., Cohen, S., Davidson, S.B.: A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 133–147. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Callahan, S., Freire, J., Santos, E., Scheidegger, D., Silva, C., Vo, H.: VisTrails: Visualization Meets Data Management. In: SIGMOD (2006)

    Google Scholar 

  8. Chapman, S., Jagadish, H.V., Ramanan, P.: Efficient Provenance Storage. In: SIGMOD (2008)

    Google Scholar 

  9. Davidson, S.B., Freire, J.: Provenance and Scientific Workflows: Challenges and Opportunities. In: SIGMOD (2008)

    Google Scholar 

  10. Heinis, T., Alonso, G.: Efficient Lineage Tracking for Scientific Workflows. In: SIGMOD (2008)

    Google Scholar 

  11. Hidders, J., Kwasnikowska, N., Sroka, J., Tyszkiewicz, J., den Bussche, J.V.: Petri Net + Nested Relational Calculus = Dataflow. In: Meersman, R., Tari, Z. (eds.) OTM 2005. LNCS, vol. 3760, pp. 220–237. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Holland, D., Braun, U., Maclean, D., Muniswamy-Reddy, K.K., Seltzer, M.: A Data Model and Query Language Suitable for Provenance. In: IPAW 2008 (2008)

    Google Scholar 

  13. Kahn, G.: The Semantics of a Simple Language for Parallel Programming. In: IFIP Congress, vol. 74 (1974)

    Google Scholar 

  14. Lee, E.A., Matsikoudis, E.: The Semantics of Dataflow with Firing. In: From Semantics to Computer Science: Essays in memory of Gilles Kahn. Cambridge University Press, Cambridge (2008)

    Google Scholar 

  15. Ludäscher, B., et al.: Scientific Workflow Management and the Kepler System. Conc. Comput.: Pract. Exper. 18(10) (2006)

    Google Scholar 

  16. McPhillips, T., Bowers, S., Zinn, D., Ludäscher, B.: Scientific Workflow Design for Mere Mortals. Future Generation Computer Systems 25(5) (2009)

    Google Scholar 

  17. Moreau, L., Freire, J., Futrelle, J., McGrath, R., Myers, J., Paulson, P.: The Open Provenance Model. Tech. Rep. 14979, ECS, Univ. of Southampton (2007)

    Google Scholar 

  18. Moreau, L., et al.: The First Provenance Challenge. Conc. Comput.: Pract. Exper., Special Issue on the First Provenance Challenge 20(5) (2008)

    Google Scholar 

  19. Oinn, T., et al.: Taverna: Lessons in Creating a Workflow Environment for the Life Sciences. Conc. Comput.: Pract. Exper. 18(10) (2006)

    Google Scholar 

  20. Qin, J., Fahringer, T.: Advanced Data Flow Support for Scientific Grid Workflow Applications. In: ACM/IEEE Conf. on Supercomputing (2007)

    Google Scholar 

  21. Scheidegger, C., Koop, D., Santos, E., Vo, H., Callahan, S., Freire, J., Silva, C.: Tackling the Provenance Challenge One Layer at a Time. Conc. Comput.: Pract. Exper. 20(5) (2008)

    Google Scholar 

  22. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Record 34(3) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Anand, M.K., Bowers, S., McPhillips, T., Ludäscher, B. (2009). Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02279-1_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02278-4

  • Online ISBN: 978-3-642-02279-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics