Abstract
During the development of NoSQL-backed software, the database schema evolves naturally alongside the application code. Especially in agile development, new application releases are deployed frequently. Eventually, decisions have to be made regarding the migration of versioned legacy data which is persisted in the cloud-hosted production database. We address this schema evolution problem and present results by means of which software project stakeholders can manage the operative costs for schema evolution and adapt their software release strategy accordingly in order to comply with service-level agreements regarding the competing metrics of migration costs and latency. We clarify conclusively how schema evolution in NoSQL databases impacts these metrics while taking all relevant characteristics of migration scenarios into account. As calculating all combinatorics in the search space of migration scenarios by far exceeds computational means, we use a probabilistic Monte Carlo method of repeated sampling, serving as a well-established method to bring the complexity of schema evolution under control.
This work has been funded by the German Research Foundation (DFG, grant #385808805). An extended version of this paper is available as a preprint [20].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Writing an entity currently costs USD 0.108 per 100,000 documents for regional location pricing in North America as of July 21, 2021 (see https://cloud.google.com/datastore/pricing). Not all schema changes add properties to the entities, yet we assume this rough estimate, as there are both cheaper schema changes (deletes) and more expensive schema changes (reorganizing properties affecting multiple types).
- 2.
More details and illustrations can be found in the long version of this paper [20].
- 3.
In case of the incremental strategy, the snapshots of latency after 12 releases do not fully represent all releases. Thus, we projected the values in Fig. 5 accordingly.
References
3T Software Labs Ltd.: MongoDB Trends Report. Cambridge, U.K. (2020)
Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)
Aulbach, S., Jacobs, D., Kemper, A., Seibold, M.: A comparison of flexible schemas for software as a service. In: Proceedings of the SIGMOD 2009. ACM (2009)
Barker, S., Chi, Y., Moon, H.J., Hacigümüş, H., Shenoy, P.: “Cut me some slack” latency-aware live migration for databases. In: Proceedings of the EDBT 2012 (2012)
Bertino, E., Guerrini, G., Mesiti, M., Tosetto, L.: Evolving a set of DTDs according to a dynamic set of XML documents. In: Proceedings of the EDBT 2002 Workshops (2002)
Chen, J., Jindel, S., Walzer, R., Sen, R., Jimsheleishvilli, N., Andrews, M.: The MemSQL query optimizer: a modern optimizer for real-time analytics in a distributed database. Proc. VLDB Endow. 9(13), 1401–1412 (2016)
Cleve, A., Gobert, M., Meurice, L., Maes, J., Weber, J.: Understanding database schema evolution. Sci. Comput. Program. 97(P1), 113–121 (2015)
Curino, C., et al.: Relational cloud: a database-as-a-service for the cloud. In: Proceedings of the CIDR 2011 (2011)
Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Automating the database schema evolution process. VLDB J. 22(1), 73–98 (2013)
Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in wikipedia - toward a web information system benchmark. In: Proceedings of the ICEIS 2008 (2008)
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Ellison, M., Calinescu, R., Paige, R.F.: Evaluating cloud database migration options using workload models. J. Cloud Comput. 7(1), 1–18 (2018). https://doi.org/10.1186/s13677-018-0108-5
Fishman, G.: Monte Carlo: Concepts, Algorithms, and Applications. Springer Series in Operations Research and Financial Engineering, Springer, Heidelberg (2013)
Goeminne, M., Decan, A., Mens, T.: Co-evolving code-related and database-related changes in a data-intensive software system. In: 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE) (2014)
Guerrini, G., Mesiti, M., Rossi, D.: Impact of XML schema evolution on valid documents. In: Proceedings of the WIDM 2005 Workshop. ACM (2005)
Haas, P.J.: Monte Carlo methods for uncertain data. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems. Springer, New York (2018). https://doi.org/10.1007/978-1-4899-7993-3_80692-2
Herrmann, K., Voigt, H., Behrend, A., Rausch, J., Lehner, W.: Living in parallel realities: co-existing schema versions with a bidirectional database evolution language. In: Proceedings of the SIGMOD 2017. ACM (2017)
Hillenbrand, A., Levchenko, M., Störl, U., Scherzinger, S., Klettke, M.: MigCast: Putting a price tag on data model evolution in NoSQL data stores. In: Proceedings of the SIGMOD 2019. ACM (2019)
Hillenbrand, A., Störl, U., Levchenko, M., Nabiyev, S., Klettke, M.: Towards self-adapting data migration in the context of schema evolution in NoSQL databases. In: Proceedings of the ICDE 2020 Workshops. IEEE (2020)
Hillenbrand, A., Störl, U., Nabiyev, S., Scherzinger, S.: MigCast in Monte Carlo: The Impact of Data Model Evolution in NoSQL Databases. CoRR abs/2104.11787 (2021)
Jampani, R., Xu, F., Wu, M., Perez, L., Jermaine, C., Haas, P.J.: The Monte Carlo database system: stochastic analysis close to the data. ACM TODS 36(3), 1–41 (2011)
Klettke, M., Störl, U., Shenavai, M., Scherzinger, S.: NoSQL schema evolution and big data migration at scale. In: Proceedings of the SCDM 2016. IEEE (2016)
Levandoski, J.J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: Proceedings of the ICDE 2013. IEEE (2013)
MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge, USA (2003)
Meurice, L., Cleve, A.: Supporting schema evolution in schema-less NoSQL data stores. In: Proceedings of the SANER 2017 (2017)
Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of the SIGSOFT 2013. ACM (2013)
Saur, K., Dumitras, T., Hicks, M.W.: Evolving NoSQL databases without downtime. In: Proceedings of the ICSME 2016. IEEE (2016)
Scherzinger, S., Sidortschuck, S.: An empirical study on the design and evolution of NoSQL database schemas. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 441–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_33
Stonebraker, M.: My top ten fears about the DBMS field. In: Proceedings of the ICDE 2018. IEEE (2018)
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Synth. Lect. Data Manag. 3(2), 1–180 (2011)
Vassiliadis, P., Zarras, A., Skoulis, I.: Gravitating to rigidity: patterns of schema evolution -and its absence- in the lives of tables. Inf. Syst. 63, 24–46 (2016)
Li, X.: A survey of schema evolution in object-oriented databases. In: Proceedings of the TOOLS 1999. IEEE (1999)
Acknowledgments
We thank Tobias Kreiter, Shamil Nabiyev, Maksym Levchenko, and Jan-Christopher Mair for their contributions to MigCast.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hillenbrand, A., Scherzinger, S., Störl, U. (2021). Remaining in Control of the Impact of Schema Evolution in NoSQL Databases. In: Ghose, A., Horkoff, J., Silva Souza, V.E., Parsons, J., Evermann, J. (eds) Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13011. Springer, Cham. https://doi.org/10.1007/978-3-030-89022-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-89022-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89021-6
Online ISBN: 978-3-030-89022-3
eBook Packages: Computer ScienceComputer Science (R0)