Skip to main content
Log in

Tracking provenance of earth science data

  • Research Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Tremendous volumes of data have been captured, archived and analyzed. Sensors, algorithms and processing systems for transforming and analyzing the data are evolving over time. Web Portals and Services can create transient data sets on-demand. Data are transferred from organization to organization with additional transformations at every stage. Provenance in this context refers to the source of data and a record of the process that led to its current state. It encompasses the documentation of a variety of artifacts related to particular data. Provenance is important for understanding and using scientific datasets, and critical for independent confirmation of scientific results. Managing provenance throughout scientific data processing has gained interest lately and there are a variety of approaches. Large scale scientific datasets consisting of thousands to millions of individual data files and processes offer particular challenges. This paper uses the analogy of art history provenance to explore some of the concerns of applying provenance tracking to earth science data. It also illustrates some of the provenance issues with examples drawn from the Ozone Monitoring Instrument (OMI) Data Processing System (OMIDAPS) (Tilmes et al. 2004) run at NASA’s Goddard Space Flight Center by the first author.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.foaf-project.org

  2. Cool URIs don’t change, http://www.w3.org/Provider/Style/URI

  3. The graph notation follows The Open Provenance Model (Moreau et al. 2008a), arrows point from artifacts back to inputs from which the artifacts were derived.

  4. http://aws.amazon.com/ec2

  5. http://nebula.nasa.gov

References

  • Bose R, Frew J (2005) Lineage retrieval for scientific data processing: a survey. ACM Comput Surv 37(1):1–28. doi:10.1145/1057977.1057978

    Article  Google Scholar 

  • da Silva PP, McGuinness DL, Fikes R (2006) A proof markup language for Semantic Web services. Inf Syst 31(4–5):381–395. doi:10.1016/j.is.2005.02.003, http://www.sciencedirect.c7cb2466e94e825 , the Semantic Web and Web Services

    Google Scholar 

  • Freire J, Missier P, Moreau L, Schreiber A, Mattoso M, Silva CT (2008) Provenance and annotation of data and processes, vol 5272/2008. Springer, Berlin. doi:10.1007/978-3-540-89965-5

    Book  Google Scholar 

  • Heinis T, Alonso G (2008) Efficient lineage tracking for scientific workflows. In: SIGMOD ’08: proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York, pp 1007–1018. doi:10.1145/1376616.1376716

    Chapter  Google Scholar 

  • Moreau L, Ludäscher B, Altintas I, Barga RS, Bowers S, Callahan S, Chin GJ, Clifford B, Cohen S, Cohen-Boulakia S, Davidson S, Deelman E, Digiampietri L, Foster J, Freire I, Frew J, Futrelle J, Gibson T, Gil Y, Goble C, Golbeck J, Groth P, Holland DA, Jiang S, Kim J, Koop D, Krenek A, McPhillips T, Mehta G, Miles S, Metzger D, Munroe S, Myers J, Plale B, Podhorszki N, Ratnakar V, Santos E, Scheidegger C, Schuchardt K, Seltzer M, Simmhan YL, Silva C, Slaughter P, Stephan E, Stevens R, Turi D, Vo H, Wilde M, Zhao J, Zhao Y (2007) Special issue: the first provenance challenge. Concurr Comput: Practice and Experience 20(5):409–418. doi:10.1002/cpe.1233

    Article  Google Scholar 

  • Moreau L, Freire J, Futrelle J, Mcgrath R, Myers J, Paulson P (2008a) The open provenance model: an overview. Provenance and annotation of data and processes, pp 323–326. doi:10.1007/978-3-540-89965-5_31

  • Moreau L, Groth P, Miles S, Vazquez-Salceda J, Ibbotson J, Jiang S, Munroe S, Rana O, Schreiber A, Tan V, Varga L (2008b) The provenance of electronic data. Commun ACM 51(4):52–58. doi:10.1145/1330311.1330323

    Article  Google Scholar 

  • Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The eucalyptus open-source cloud-computing system. In: CCGRID ’09: proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid. IEEE Computer Society, Washington, DC, pp 124–131. doi:10.1109/CCGRID.2009.93

    Google Scholar 

  • Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-science. SIGMOD Rec 34(3):31–36. doi:10.1145/1084805.1084812

    Article  Google Scholar 

  • Suarez-Sola I, Davey A, Hourcle JA (2008) What are we tracking ... and why? AGU Fall Meeting Abstracts, pp C1047+

  • Tilmes C, Linda M, Fleig A (2004) Development of two Science Investigator-led Processing Systems (SIPS) for NASA’s Earth Observation System (EOS). In: Geoscience and remote sensing symposium, 2004. In: IGARSS ’04. Proceedings. 2004 IEEE International, vol 3, pp 2190–2195. doi:10.1109/IGARSS.2004.1370795

Download references

Acknowledgement

Thanks to the NASA MODIS and OMI Data Processing teams.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Curt Tilmes.

Additional information

Communicated by: Thomas Narock

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tilmes, C., Yesha, Y. & Halem, M. Tracking provenance of earth science data. Earth Sci Inform 3, 59–65 (2010). https://doi.org/10.1007/s12145-010-0046-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-010-0046-3

Keywords

Navigation