Skip to main content

Data Genome: An Abstract Model for Data Evolution

  • Conference paper
Advances in Computation and Intelligence (ISICA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4683))

Included in the following conference series:

Abstract

Modern information systems often process data that has been transferred, transformed or integrated from a variety of sources. In many application domains, information concerning the derivation of data items is crucial. Currently, a kind of metadata called data provenance is investigated by many researchers, but collection of provenance information must be maintained explicitly by dataset maintainer or specialized provenance management system. In this paper we investigate the problem of providing support of derivation information for applications in dataset itself. We put forward that every dataset has a unique data genome evolving with the evolution of dataset. Data genome is part of data and records derivation information for data actively. The characteristics of data genome show that the lineage of datasets can be uncovered by analyzing theirs data genomes. We also present computations of data genomes such as clone, transmit, mutate and introject to show how data genome evolves to provide derivation information from dataset itself.

This work is supported by Guangdong High-Tech Program (2006B80407001, 2006B 11301001), and Guangzhou High-Tech Program (2006Z3-D3081).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lanter, D.P.: Design Of A Lineage-Based Meta-Data Base For GIS. Cartography and Geographic Information Systems 18, 255–261 (1991)

    Article  Google Scholar 

  2. Greenwood, M., Goble, C., Stevens, R., et al.: Provenance of e-Science Experiments - experience from Bioinformatics. In: Proc. of the UK OST e-Science second All Hands Meeting (2003)

    Google Scholar 

  3. Woodruff, A.G., Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: ICDE 1997. pp. 91–102 (1997)

    Google Scholar 

  4. Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1), 1–28 (2005)

    Article  Google Scholar 

  5. Bhagwat, D., Chiticariu, L., Tan, W.C., et al.: An annotation management system for relational databases. VLDB Journal 14(4), 373–396 (2005)

    Article  Google Scholar 

  6. Cui, Y.W., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trannsactions on Database Systems 25(2), 179–227 (2000)

    Article  Google Scholar 

  7. Cui, Y.W., Widom, J.: Lineage tracing for general data warehouse transformations. VLDB Journal 12(1), 41–58 (2003)

    Article  Google Scholar 

  8. Buneman, P., Khanna, S., Tajima, K., et al.: Archiving scientific data. ACM Trannsactions on Database Systems 29(1), 2–42 (2004)

    Article  Google Scholar 

  9. Jagadish, H.V., Olken, F.: Database Management for Life Sciences Research. SIGMOD Record 33, 15–20 (2004)

    Article  Google Scholar 

  10. álvarez, S., Vázquez-Salceda, J., et al.: Applying Provenance in Distributed Organ Transplant Management. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 28–36. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Petrinja, E., Stankovski, V., Turk, Ž.: A provenance data management system for improving the product modeling process. Automation in Construction 16, 485–497 (2007)

    Article  Google Scholar 

  12. Foster, I., Vockler, J., Wilde, M., et al.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proc. 14th International Conference on Scientific and Statistical Database Management, pp. 37–46 (2002)

    Google Scholar 

  13. Zhao, Y., Wilde, M., Foster, I.: Applying the Virtual Data Provenance Model. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 148–161. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)

    Article  Google Scholar 

  15. Braun, U., Garfinkel, S.L., Holland, D.A., et al.: Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Tang, D.Y., Xi, J.Q., Guo, Y.B.: Model and Algebra for Genetic Information of Data. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 1071–1079. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Tang, D.Y., Guo, Y.B., Xi, J.Q.: The Concept of Data Genome and Its Applications. In: SPCA 2006. Proceeding of the First International Symposium on Pervasive Computing and Applications, pp. 866–871 (2006)

    Google Scholar 

  18. Buneman, P., Cheney, J., VanSummeren, S.: On the expressiveness of implicit provenance in query and update languages. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, Springer, Heidelberg (2006)

    Google Scholar 

  19. Buneman, P., Khanna, S., Tan, W.C.: WC.Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Lishan Kang Yong Liu Sanyou Zeng

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tang, D., Xi, J., Guo, Y., Shen, S. (2007). Data Genome: An Abstract Model for Data Evolution. In: Kang, L., Liu, Y., Zeng, S. (eds) Advances in Computation and Intelligence. ISICA 2007. Lecture Notes in Computer Science, vol 4683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74581-5_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74581-5_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74580-8

  • Online ISBN: 978-3-540-74581-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics