Abstract
Modern information systems often process data that has been transferred, transformed or integrated from a variety of sources. In many application domains, information concerning the derivation of data items is crucial. Currently, a kind of metadata called data provenance is investigated by many researchers, but collection of provenance information must be maintained explicitly by dataset maintainer or specialized provenance management system. In this paper we investigate the problem of providing support of derivation information for applications in dataset itself. We put forward that every dataset has a unique data genome evolving with the evolution of dataset. Data genome is part of data and records derivation information for data actively. The characteristics of data genome show that the lineage of datasets can be uncovered by analyzing theirs data genomes. We also present computations of data genomes such as clone, transmit, mutate and introject to show how data genome evolves to provide derivation information from dataset itself.
This work is supported by Guangdong High-Tech Program (2006B80407001, 2006B 11301001), and Guangzhou High-Tech Program (2006Z3-D3081).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lanter, D.P.: Design Of A Lineage-Based Meta-Data Base For GIS. Cartography and Geographic Information Systems 18, 255–261 (1991)
Greenwood, M., Goble, C., Stevens, R., et al.: Provenance of e-Science Experiments - experience from Bioinformatics. In: Proc. of the UK OST e-Science second All Hands Meeting (2003)
Woodruff, A.G., Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: ICDE 1997. pp. 91–102 (1997)
Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1), 1–28 (2005)
Bhagwat, D., Chiticariu, L., Tan, W.C., et al.: An annotation management system for relational databases. VLDB Journal 14(4), 373–396 (2005)
Cui, Y.W., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trannsactions on Database Systems 25(2), 179–227 (2000)
Cui, Y.W., Widom, J.: Lineage tracing for general data warehouse transformations. VLDB Journal 12(1), 41–58 (2003)
Buneman, P., Khanna, S., Tajima, K., et al.: Archiving scientific data. ACM Trannsactions on Database Systems 29(1), 2–42 (2004)
Jagadish, H.V., Olken, F.: Database Management for Life Sciences Research. SIGMOD Record 33, 15–20 (2004)
álvarez, S., Vázquez-Salceda, J., et al.: Applying Provenance in Distributed Organ Transplant Management. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 28–36. Springer, Heidelberg (2006)
Petrinja, E., Stankovski, V., Turk, Ž.: A provenance data management system for improving the product modeling process. Automation in Construction 16, 485–497 (2007)
Foster, I., Vockler, J., Wilde, M., et al.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proc. 14th International Conference on Scientific and Statistical Database Management, pp. 37–46 (2002)
Zhao, Y., Wilde, M., Foster, I.: Applying the Virtual Data Provenance Model. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 148–161. Springer, Heidelberg (2006)
Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)
Braun, U., Garfinkel, S.L., Holland, D.A., et al.: Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)
Tang, D.Y., Xi, J.Q., Guo, Y.B.: Model and Algebra for Genetic Information of Data. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 1071–1079. Springer, Heidelberg (2006)
Tang, D.Y., Guo, Y.B., Xi, J.Q.: The Concept of Data Genome and Its Applications. In: SPCA 2006. Proceeding of the First International Symposium on Pervasive Computing and Applications, pp. 866–871 (2006)
Buneman, P., Cheney, J., VanSummeren, S.: On the expressiveness of implicit provenance in query and update languages. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, Springer, Heidelberg (2006)
Buneman, P., Khanna, S., Tan, W.C.: WC.Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tang, D., Xi, J., Guo, Y., Shen, S. (2007). Data Genome: An Abstract Model for Data Evolution. In: Kang, L., Liu, Y., Zeng, S. (eds) Advances in Computation and Intelligence. ISICA 2007. Lecture Notes in Computer Science, vol 4683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74581-5_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-74581-5_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74580-8
Online ISBN: 978-3-540-74581-5
eBook Packages: Computer ScienceComputer Science (R0)