Automating the database schema evolution process

Curino, Carlo; Moon, Hyun Jin; Deutsch, Alin; Zaniolo, Carlo

doi:10.1007/s00778-012-0302-x

Automating the database schema evolution process

Special Issue Paper
Published: 28 December 2012

Volume 22, pages 73–98, (2013)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Carlo Curino¹,
Hyun Jin Moon²,
Alin Deutsch³ &
…
Carlo Zaniolo⁴

2655 Accesses
79 Citations
6 Altmetric
Explore all metrics

Abstract

Supporting database schema evolution represents a long-standing challenge of practical and theoretical importance for modern information systems. In this paper, we describe techniques and systems for automating the critical tasks of migrating the database and rewriting the legacy applications. In addition to labor saving, the benefits delivered by these advances are many and include reliable prediction of outcome, minimization of downtime, system-produced documentation, and support for archiving, historical queries, and provenance. The PRISM/PRISM++ system delivers these benefits, by solving the difficult problem of automating the migration of databases and the rewriting of queries and updates. In this paper, we present the PRISM/PRISM++ system and the novel technology that made it possible. In particular, we focus on the difficult and previously unsolved problem of supporting legacy queries and updates under schema and integrity constraints evolution. The PRISM/PRISM++ approach consists in providing the users with a set of SQL-based Schema Modification Operators (SMOs), which describe how the tables in the old schema are modified into those in the new schema. In order to support updates, SMOs are extended with integrity constraints modification operators. By using recent results on schema mapping, the paper (i) characterizes the impact on integrity constraints of structural schema changes, (ii) devises representations that enable the rewriting of updates, and (iii) develop a unified approach for query and update rewriting under constraints. We complement the system with two novel tools: the first automatically collects and provides statistics on schema evolution histories, whereas the second derives equivalent sequences of SMOs from the migration scripts that were used for schema upgrades. These tools were used to produce an extensive testbed containing 15 evolution histories of scientific databases and web information systems, providing over 100 years of aggregate evolution histories and almost 2,000 schema evolution steps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DB-GPT: Large Language Model Meets Database

Article Open access 19 January 2024

Design methods for the new database era: a systematic literature review

Article 27 June 2019

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Notes

Source: http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html.
From the SVN commit 5552 accessible at: http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=5552.
PRISM is an acronym for Panta Rhei Information & Schema Manager—‘Panta Rhei’ (Everything is in flux) is often credited to Heraclitus. The project homepage is as follows: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prism.
This is an adaptation of the classical view-update semantics [14, 25, 37] to our context, in which evolution operators replace the views.
See: http://tinyurl.com/updaterewriting.
See Ensembl CVS repository at: http://tinyurl.com/ensembl-schema.
An \(exon\) is a nucleic acid sequence related to a portion of DNA.
This information is derived from the CVS logs and from the SQL used for data migration.
These are simple equality assertions about the value of a column and constants, supported by the SQL DDL.
Note that we apply the definition only for the case when \(M\) is a functional mapping. This suffices in our context since we force evolution operators to be invertible (as explained below). In general, however, classical schema mappings [34] may associate several possible \(S_2\)-instances with a given \(S_1\)-instance.
This process is semi-automatic, and the user is guided by the system in the selection –at evolution time, not at query rewriting time– of the inverse for each SMO [23].
Note that some of the updates will now fail due to the stricter constraints. This is unavoidable to maintain the DB instance \(I_2\) consistent with \(IC_2\) and is in general well-accepted consequence of tightening constraints.
This policy is only available for rewriting purposes, that is, for inverses of ICMOs, since the use for data migration would lead to an inconsistent DB instance: \(I_2 \not \models IC_2\).
Note that the evolution is information preserving: (forward) thanks to the primary key on id, and (inverse) since the system automatically declares the integrity constraints valid in the output Footnote 14 continued
Fig. 9
Update rewriting through SMO
Full size image

schema (two primary keys on the id columns, and two cross foreign keys).
We release the two datasets at: http://db.csail.mit.edu/wikipedia/ and http://db.csail.mit.edu/ensembldb/.

References

http://publib.boulder.ibm.com/infocenter/mptoolic/v1r0/index.jsp?topic=/com.ibm.db2tools.chx.doc.ug/chxucoview01.htm
http://www.embarcadero.com/products/db-change-manager
http://www.idera.com/SQL-Server/
http://www.liquibase.org/
http://www.mysql.com/products/workbench/
http://www.oracle.com/us/products/enterprise-manager/change-management-pack-11g-ds-068451.pdf
http://www.red-gate.com/
http://www.sqledit.com/index.html
http://www.swissql.com/
Abiteboul, S., Duschka, O.M.: Complexity of answering queries using materialized views. In: PODS, pp. 254–263 (1998)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley, Reading (1995)
MATH Google Scholar
Afrati, F.N., Kolaitis, P.G.: Repair checking in inconsistent databases: Algorithms and complexity. In: ICDT, pp. 31–41 (2009)
Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS, pp. 68–79 (1999)
Bancilhon, F., Spyratos, N.: Update semantics of relational views. ACM Trans. Database Syst. 6(4), 557–575 (1981)
Article MATH Google Scholar
Bernstein, P.A.: Applying model management to classical meta data problems. In: CIDR (2003)
Bernstein, P.A., Green, T.J., Melnik, S., Nash, A.: Implementing mapping composition. VLDB J. 17(2), 333–353 (2008)
Article Google Scholar
Bohannon, A., Pierce, B.C., Vaughan, J.A.: Relational lenses: A language for updatable views. In: PODS, pp. 338–347 (2006)
Cleve A., Hainaut, J.-L.: Co-transformations in database applications evolution. In: GTTSE, pp. 409–421 (2006)
Curino, C., Ham, M., Moroni, F., Zaniolo, C.: Pantha rei data set. http://data.schemaevolution.org/ (2009)
Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Update rewriting and integrity constraint maintenance in a schema evolution support system: Prism++. PVLDB 4(2), 117–128 (2010)
Google Scholar
Curino, C., Moon, H.J., Ham, M., Zaniolo, C.: The prism workbench: Database schema evolution without tears. In: ICDE (2009)
Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in Wikipedia: Toward a web information system benchmark. ICEIS (2008)
Curino, C., Moon, H.J., Zaniolo, C.: Graceful database schema evolution: The prism workbench. PVLDB 1(1), 761–772 (2008)
Google Scholar
Curino, C., Moon, H.J.: C. Zaniolo. Managing the history of metadata in support for db archiving and schema evolution. In: ECDM (2008)
Dayal, U., Bernstein, P.A.: On the correct translation of update operations on relational views. ACM Trans. Database Syst. 7(3), 381–416 (1982)
Article MathSciNet MATH Google Scholar
Deutsch, A., Nash, A., Remmel, J.: The chase revisited. In: PODS, pp. 149–158 (2008)
Deutsch, A., Tannen, V.: Mars: A system for publishing xml from mixed and redundant storage. In: VLDB, pp. 201–212 (2003)
Ensembl development team. Ensembl Genetic DB http://www.ensembl.org, 2009 (Online)
Fagin, R.: Inverting schema mappings. ACM Trans. Database Syst. 32(4), 25:1–25:51 (2007)
Google Scholar
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Composing schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst. 30(4), 994–1055 (2005)
Article Google Scholar
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.-C.: Quasi-inverses of schema mappings. In: PODS, pp. 123–132 (2007)
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W. C.: Reverse data exchange: Coping with nulls. In: PODS, pp. 23–32 (2009)
Hartung, M., Terwilliger, J.F., Rahm, E.: Recent advances in schema and ontology evolution. In: Schema Matching and Mapping, pp. 149–190 (2011)
Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. In: SIGMOD, p. 607 (2001)
Hick, J.-M., Hainaut, J.-L.: Database application evolution: A transformational approach. Data Knowl. Eng. 59(3), 534–558 (2006)
Article Google Scholar
Hull, R.: Non-finite specifiability of projections of functional dependency families. Theor. Comput. Sci. 39, 239–265 (1985)
Article MathSciNet MATH Google Scholar
Kotidis, Y., Srivastava, D., Velegrakis, Y.: Updates through views: A new hope. In: ICDE, p. 2 (2006)
Lenzerini, M.: Data integration: A theoretical perspective. In: PODS, pp. 233–246 (2002)
Liu, Z., He, B., Hsiao, H.-I., Chen, Y.: Efficient and scalable data evolution with column oriented databases. In: EDBT (2011)
Madhavan, J., Halevy, A.Y.: Composing mappings among data sources. In: VLDB, pp. 572–583 (2003)
Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A programming platform for generic model management. In: SIGMOD (2003)
Miller, R.J., Ioannidis, Y.E., Ramakrishnan, R.: The use of information capacity in schema integration and translation. In: VLDB, pp. 120–133 (1993)
Miller, R.J., Ioannidis, Y.E., Ramakrishnan, R.: Schema equivalence in heterogeneous systems: Bridging theory and practice. Inf. Syst. 19(1), 3–31 (1994)
Article Google Scholar
Moon, H.J., Curino, C., Deutsch, A., Hou, C.-Y., Zaniolo, C.: Managing and querying transaction-time databases under schema evolution. PVLDB 1(1), 882–895 (2008)
Google Scholar
Moon, H.J., Curino, C., Zaniolo, C.: Scalable architecture and query optimization for transaction-time dbs with evolving schemas. In: SIGMOD Conference, pp. 207–218 (2010)
Moroni, F.: Schema Evolution Toolsuite: Analysis and Interpretation of Relational Schema Changes. Master’s thesis, Politecnico di Milano—Dipartimento di Elettronica e Informazione (2009)
Papastefanatos, G., Vassiliadis, P., Simitsis, A., Vassiliou, Y.: Hecataeus: Regulating schema evolution. In: ICDE, pp. 1181–1184, March (2010)
Qian, L., LeFevre, K., Jagadish, H.V.: Crius: User-friendly database design. PVLDB 4(2), 81–92 (2010)
Google Scholar
Ra, Y.-G.: Relational schema evolution for program independency. In: Proceedings of the 7th international conference on Intelligent Information Technology, pp 273–281, Springer, Heidelberg (2004). doi:10.1007/978-3-540-30561-3_29
Terwilliger, J.F., Bernstein, P.A., Unnithan, A.: Worry-free database upgrades: Automated model-driven evolution of schemas and complex mappings. In: SIGMOD Conference (2010)
Terwilliger, J.F., Fernández-Moctezuma, R., Delcambre, L.M.L., Maier, D.: Support for schema evolution in data stream management systems. J. UCS 16(20), 3073–3101 (2010)
MATH Google Scholar
Ullman, J.: Principles of Database System. Computer Science Press, Rockville (1982)
Google Scholar
Ullman, J.D.: Information integration using logical views. Theor. Comput. Sci. 239(2), 189–210 (2000)
Article MathSciNet MATH Google Scholar
Velegrakis, Y., Miller, R.J., Popa, L.: Mapping adaptation under evolving schemas. In: VLDB, pp. 584–595 (2003)
Wikimedia Foundation. Wikipedia, the free encyclopedia http://en.wikipedia.org/, 2007 (Online)
Wikimedia Foundation. The mediawiki http://www.mediawiki.org, 2008
Yu, C., Popa, L.: Semantic adaptation of schema mappings when schemas evolve. In: VLDB, pp. 1006–1017 (2005)

Download references

Acknowledgments

The authors would like to thank Fabrizio Moroni, MyungWon Ham for their help in developing the tool, and Letizia Tanca for the great feedback and support.

Author information

Authors and Affiliations

Microsoft, Mountain View, CA, USA
Carlo Curino
Google Inc., Mountain View, CA, USA
Hyun Jin Moon
UCSD, La Jolla, CA, USA
Alin Deutsch
UCLA, Los Angeles, CA, USA
Carlo Zaniolo

Authors

Carlo Curino
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Jin Moon
View author publications
You can also search for this author in PubMed Google Scholar
Alin Deutsch
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Zaniolo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlo Curino.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Curino, C., Moon, H.J., Deutsch, A. et al. Automating the database schema evolution process. The VLDB Journal 22, 73–98 (2013). https://doi.org/10.1007/s00778-012-0302-x

Download citation

Received: 12 January 2012
Revised: 10 November 2012
Accepted: 26 November 2012
Published: 28 December 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s00778-012-0302-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automating the database schema evolution process

Abstract

Access this article

Similar content being viewed by others

DB-GPT: Large Language Model Meets Database

Design methods for the new database era: a systematic literature review

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automating the database schema evolution process

Abstract

Access this article

Similar content being viewed by others

DB-GPT: Large Language Model Meets Database

Design methods for the new database era: a systematic literature review

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation