Skip to main content
Log in

Automating the database schema evolution process

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Supporting database schema evolution represents a long-standing challenge of practical and theoretical importance for modern information systems. In this paper, we describe techniques and systems for automating the critical tasks of migrating the database and rewriting the legacy applications. In addition to labor saving, the benefits delivered by these advances are many and include reliable prediction of outcome, minimization of downtime, system-produced documentation, and support for archiving, historical queries, and provenance. The PRISM/PRISM++ system delivers these benefits, by solving the difficult problem of automating the migration of databases and the rewriting of queries and updates. In this paper, we present the PRISM/PRISM++ system and the novel technology that made it possible. In particular, we focus on the difficult and previously unsolved problem of supporting legacy queries and updates under schema and integrity constraints evolution. The PRISM/PRISM++ approach consists in providing the users with a set of SQL-based Schema Modification Operators (SMOs), which describe how the tables in the old schema are modified into those in the new schema. In order to support updates, SMOs are extended with integrity constraints modification operators. By using recent results on schema mapping, the paper (i) characterizes the impact on integrity constraints of structural schema changes, (ii) devises representations that enable the rewriting of updates, and (iii) develop a unified approach for query and update rewriting under constraints. We complement the system with two novel tools: the first automatically collects and provides statistics on schema evolution histories, whereas the second derives equivalent sequences of SMOs from the migration scripts that were used for schema upgrades. These tools were used to produce an extensive testbed containing 15 evolution histories of scientific databases and web information systems, providing over 100 years of aggregate evolution histories and almost 2,000 schema evolution steps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Source: http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html.

  2. From the SVN commit 5552 accessible at:  http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=5552.

  3. PRISM is an acronym for Panta Rhei Information & Schema Manager—‘Panta Rhei’ (Everything is in flux) is often credited to Heraclitus. The project homepage is as follows:  http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prism.

  4. This is an adaptation of the classical view-update semantics [14, 25, 37] to our context, in which evolution operators replace the views.

  5. See: http://tinyurl.com/updaterewriting.

  6. See Ensembl CVS repository at: http://tinyurl.com/ensembl-schema.

  7. An \(exon\) is a nucleic acid sequence related to a portion of DNA.

  8. This information is derived from the CVS logs and from the SQL used for data migration.

  9. These are simple equality assertions about the value of a column and constants, supported by the SQL DDL.

  10. Note that we apply the definition only for the case when \(M\) is a functional mapping. This suffices in our context since we force evolution operators to be invertible (as explained below). In general, however, classical schema mappings [34] may associate several possible \(S_2\)-instances with a given \(S_1\)-instance.

  11. This process is semi-automatic, and the user is guided by the system in the selection –at evolution time, not at query rewriting time– of the inverse for each SMO [23].

  12. Note that some of the updates will now fail due to the stricter constraints. This is unavoidable to maintain the DB instance \(I_2\) consistent with \(IC_2\) and is in general well-accepted consequence of tightening constraints.

  13. This policy is only available for rewriting purposes, that is, for inverses of ICMOs, since the use for data migration would lead to an inconsistent DB instance: \(I_2 \not \models IC_2\).

  14. Note that the evolution is information preserving: (forward) thanks to the primary key on id, and (inverse) since the system automatically declares the integrity constraints valid in the output Footnote 14 continued

    Fig. 9
    figure 9

    Update rewriting through SMO

    schema (two primary keys on the id columns, and two cross foreign keys).

  15. We release the two datasets at: http://db.csail.mit.edu/wikipedia/ and http://db.csail.mit.edu/ensembldb/.

References

  1. http://publib.boulder.ibm.com/infocenter/mptoolic/v1r0/index.jsp?topic=/com.ibm.db2tools.chx.doc.ug/chxucoview01.htm

  2. http://www.embarcadero.com/products/db-change-manager

  3. http://www.idera.com/SQL-Server/

  4. http://www.liquibase.org/

  5. http://www.mysql.com/products/workbench/

  6. http://www.oracle.com/us/products/enterprise-manager/change-management-pack-11g-ds-068451.pdf

  7. http://www.red-gate.com/

  8. http://www.sqledit.com/index.html

  9. http://www.swissql.com/

  10. Abiteboul, S., Duschka, O.M.: Complexity of answering queries using materialized views. In: PODS, pp. 254–263 (1998)

  11. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley, Reading (1995)

    MATH  Google Scholar 

  12. Afrati, F.N., Kolaitis, P.G.: Repair checking in inconsistent databases: Algorithms and complexity. In: ICDT, pp. 31–41 (2009)

  13. Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS, pp. 68–79 (1999)

  14. Bancilhon, F., Spyratos, N.: Update semantics of relational views. ACM Trans. Database Syst. 6(4), 557–575 (1981)

    Article  MATH  Google Scholar 

  15. Bernstein, P.A.: Applying model management to classical meta data problems. In: CIDR (2003)

  16. Bernstein, P.A., Green, T.J., Melnik, S., Nash, A.: Implementing mapping composition. VLDB J. 17(2), 333–353 (2008)

    Article  Google Scholar 

  17. Bohannon, A., Pierce, B.C., Vaughan, J.A.: Relational lenses: A language for updatable views. In: PODS, pp. 338–347 (2006)

  18. Cleve A., Hainaut, J.-L.: Co-transformations in database applications evolution. In: GTTSE, pp. 409–421 (2006)

  19. Curino, C., Ham, M., Moroni, F., Zaniolo, C.: Pantha rei data set. http://data.schemaevolution.org/ (2009)

  20. Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Update rewriting and integrity constraint maintenance in a schema evolution support system: Prism++. PVLDB 4(2), 117–128 (2010)

    Google Scholar 

  21. Curino, C., Moon, H.J., Ham, M., Zaniolo, C.: The prism workbench: Database schema evolution without tears. In: ICDE (2009)

  22. Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in Wikipedia: Toward a web information system benchmark. ICEIS (2008)

  23. Curino, C., Moon, H.J., Zaniolo, C.: Graceful database schema evolution: The prism workbench. PVLDB 1(1), 761–772 (2008)

    Google Scholar 

  24. Curino, C., Moon, H.J.: C. Zaniolo. Managing the history of metadata in support for db archiving and schema evolution. In: ECDM (2008)

  25. Dayal, U., Bernstein, P.A.: On the correct translation of update operations on relational views. ACM Trans. Database Syst. 7(3), 381–416 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  26. Deutsch, A., Nash, A., Remmel, J.: The chase revisited. In: PODS, pp. 149–158 (2008)

  27. Deutsch, A., Tannen, V.: Mars: A system for publishing xml from mixed and redundant storage. In: VLDB, pp. 201–212 (2003)

  28. Ensembl development team. Ensembl Genetic DB http://www.ensembl.org, 2009 (Online)

  29. Fagin, R.: Inverting schema mappings. ACM Trans. Database Syst. 32(4), 25:1–25:51 (2007)

    Google Scholar 

  30. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)

    Article  Google Scholar 

  31. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Quasi-inverses of schema mappings. In: PODS, pp. 123–132 (2007)

  32. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W. C.: Reverse data exchange: Coping with nulls. In: PODS, pp. 23–32 (2009)

  33. Hartung, M., Terwilliger, J.F., Rahm, E.: Recent advances in schema and ontology evolution. In: Schema Matching and Mapping, pp. 149–190 (2011)

  34. Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD, p. 607 (2001)

  35. Hick, J.-M., Hainaut, J.-L.: Database application evolution: A transformational approach. Data Knowl. Eng. 59(3), 534–558 (2006)

    Article  Google Scholar 

  36. Hull, R.: Non-finite specifiability of projections of functional dependency families. Theor. Comput. Sci. 39, 239–265 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  37. Kotidis, Y., Srivastava, D., Velegrakis, Y.: Updates through views: A new hope. In: ICDE, p. 2 (2006)

  38. Lenzerini, M.: Data integration: A theoretical perspective. In: PODS, pp. 233–246 (2002)

  39. Liu, Z., He, B., Hsiao, H.-I., Chen, Y.: Efficient and scalable data evolution with column oriented databases. In: EDBT (2011)

  40. Madhavan, J., Halevy, A.Y.: Composing mappings among data sources. In: VLDB, pp. 572–583 (2003)

  41. Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A programming platform for generic model management. In: SIGMOD (2003)

  42. Miller, R.J., Ioannidis, Y.E., Ramakrishnan, R.: The use of information capacity in schema integration and translation. In: VLDB, pp. 120–133 (1993)

  43. Miller, R.J., Ioannidis, Y.E., Ramakrishnan, R.: Schema equivalence in heterogeneous systems: Bridging theory and practice. Inf. Syst. 19(1), 3–31 (1994)

    Article  Google Scholar 

  44. Moon, H.J., Curino, C., Deutsch, A., Hou, C.-Y., Zaniolo, C.: Managing and querying transaction-time databases under schema evolution. PVLDB 1(1), 882–895 (2008)

    Google Scholar 

  45. Moon, H.J., Curino, C., Zaniolo, C.: Scalable architecture and query optimization for transaction-time dbs with evolving schemas. In: SIGMOD Conference, pp. 207–218 (2010)

  46. Moroni, F.: Schema Evolution Toolsuite: Analysis and Interpretation of Relational Schema Changes. Master’s thesis, Politecnico di Milano—Dipartimento di Elettronica e Informazione (2009)

  47. Papastefanatos, G., Vassiliadis, P., Simitsis, A., Vassiliou, Y.: Hecataeus: Regulating schema evolution. In: ICDE, pp. 1181–1184, March (2010)

  48. Qian, L., LeFevre, K., Jagadish, H.V.: Crius: User-friendly database design. PVLDB 4(2), 81–92 (2010)

    Google Scholar 

  49. Ra, Y.-G.: Relational schema evolution for program independency. In: Proceedings of the 7th international conference on Intelligent Information Technology, pp 273–281, Springer, Heidelberg (2004). doi:10.1007/978-3-540-30561-3_29

  50. Terwilliger, J.F., Bernstein, P.A., Unnithan, A.: Worry-free database upgrades: Automated model-driven evolution of schemas and complex mappings. In: SIGMOD Conference (2010)

  51. Terwilliger, J.F., Fernández-Moctezuma, R., Delcambre, L.M.L., Maier, D.: Support for schema evolution in data stream management systems. J. UCS 16(20), 3073–3101 (2010)

    MATH  Google Scholar 

  52. Ullman, J.: Principles of Database System. Computer Science Press, Rockville (1982)

    Google Scholar 

  53. Ullman, J.D.: Information integration using logical views. Theor. Comput. Sci. 239(2), 189–210 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  54. Velegrakis, Y., Miller, R.J., Popa, L.: Mapping adaptation under evolving schemas. In: VLDB, pp. 584–595 (2003)

  55. Wikimedia Foundation. Wikipedia, the free encyclopedia http://en.wikipedia.org/, 2007 (Online)

  56. Wikimedia Foundation. The mediawiki http://www.mediawiki.org, 2008

  57. Yu, C., Popa, L.: Semantic adaptation of schema mappings when schemas evolve. In: VLDB, pp. 1006–1017 (2005)

Download references

Acknowledgments

The authors would like to thank Fabrizio Moroni, MyungWon Ham for their help in developing the tool, and Letizia Tanca for the great feedback and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlo Curino.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Curino, C., Moon, H.J., Deutsch, A. et al. Automating the database schema evolution process. The VLDB Journal 22, 73–98 (2013). https://doi.org/10.1007/s00778-012-0302-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-012-0302-x

Keywords

Navigation