Skip to main content

Remaining in Control of the Impact of Schema Evolution in NoSQL Databases

  • Conference paper
  • First Online:
Conceptual Modeling (ER 2021)

Abstract

During the development of NoSQL-backed software, the database schema evolves naturally alongside the application code. Especially in agile development, new application releases are deployed frequently. Eventually, decisions have to be made regarding the migration of versioned legacy data which is persisted in the cloud-hosted production database. We address this schema evolution problem and present results by means of which software project stakeholders can manage the operative costs for schema evolution and adapt their software release strategy accordingly in order to comply with service-level agreements regarding the competing metrics of migration costs and latency. We clarify conclusively how schema evolution in NoSQL databases impacts these metrics while taking all relevant characteristics of migration scenarios into account. As calculating all combinatorics in the search space of migration scenarios by far exceeds computational means, we use a probabilistic Monte Carlo method of repeated sampling, serving as a well-established method to bring the complexity of schema evolution under control.

This work has been funded by the German Research Foundation (DFG, grant #385808805). An extended version of this paper is available as a preprint [20].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Writing an entity currently costs USD 0.108 per 100,000 documents for regional location pricing in North America as of July 21, 2021 (see https://cloud.google.com/datastore/pricing). Not all schema changes add properties to the entities, yet we assume this rough estimate, as there are both cheaper schema changes (deletes) and more expensive schema changes (reorganizing properties affecting multiple types).

  2. 2.

    More details and illustrations can be found in the long version of this paper [20].

  3. 3.

    In case of the incremental strategy, the snapshots of latency after 12 releases do not fully represent all releases. Thus, we projected the values in Fig. 5 accordingly.

References

  1. 3T Software Labs Ltd.: MongoDB Trends Report. Cambridge, U.K. (2020)

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)

    Google Scholar 

  3. Aulbach, S., Jacobs, D., Kemper, A., Seibold, M.: A comparison of flexible schemas for software as a service. In: Proceedings of the SIGMOD 2009. ACM (2009)

    Google Scholar 

  4. Barker, S., Chi, Y., Moon, H.J., Hacigümüş, H., Shenoy, P.: “Cut me some slack” latency-aware live migration for databases. In: Proceedings of the EDBT 2012 (2012)

    Google Scholar 

  5. Bertino, E., Guerrini, G., Mesiti, M., Tosetto, L.: Evolving a set of DTDs according to a dynamic set of XML documents. In: Proceedings of the EDBT 2002 Workshops (2002)

    Google Scholar 

  6. Chen, J., Jindel, S., Walzer, R., Sen, R., Jimsheleishvilli, N., Andrews, M.: The MemSQL query optimizer: a modern optimizer for real-time analytics in a distributed database. Proc. VLDB Endow. 9(13), 1401–1412 (2016)

    Google Scholar 

  7. Cleve, A., Gobert, M., Meurice, L., Maes, J., Weber, J.: Understanding database schema evolution. Sci. Comput. Program. 97(P1), 113–121 (2015)

    Google Scholar 

  8. Curino, C., et al.: Relational cloud: a database-as-a-service for the cloud. In: Proceedings of the CIDR 2011 (2011)

    Google Scholar 

  9. Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Automating the database schema evolution process. VLDB J. 22(1), 73–98 (2013)

    Google Scholar 

  10. Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in wikipedia - toward a web information system benchmark. In: Proceedings of the ICEIS 2008 (2008)

    Google Scholar 

  11. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  12. Ellison, M., Calinescu, R., Paige, R.F.: Evaluating cloud database migration options using workload models. J. Cloud Comput. 7(1), 1–18 (2018). https://doi.org/10.1186/s13677-018-0108-5

    Article  Google Scholar 

  13. Fishman, G.: Monte Carlo: Concepts, Algorithms, and Applications. Springer Series in Operations Research and Financial Engineering, Springer, Heidelberg (2013)

    Google Scholar 

  14. Goeminne, M., Decan, A., Mens, T.: Co-evolving code-related and database-related changes in a data-intensive software system. In: 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE) (2014)

    Google Scholar 

  15. Guerrini, G., Mesiti, M., Rossi, D.: Impact of XML schema evolution on valid documents. In: Proceedings of the WIDM 2005 Workshop. ACM (2005)

    Google Scholar 

  16. Haas, P.J.: Monte Carlo methods for uncertain data. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems. Springer, New York (2018). https://doi.org/10.1007/978-1-4899-7993-3_80692-2

  17. Herrmann, K., Voigt, H., Behrend, A., Rausch, J., Lehner, W.: Living in parallel realities: co-existing schema versions with a bidirectional database evolution language. In: Proceedings of the SIGMOD 2017. ACM (2017)

    Google Scholar 

  18. Hillenbrand, A., Levchenko, M., Störl, U., Scherzinger, S., Klettke, M.: MigCast: Putting a price tag on data model evolution in NoSQL data stores. In: Proceedings of the SIGMOD 2019. ACM (2019)

    Google Scholar 

  19. Hillenbrand, A., Störl, U., Levchenko, M., Nabiyev, S., Klettke, M.: Towards self-adapting data migration in the context of schema evolution in NoSQL databases. In: Proceedings of the ICDE 2020 Workshops. IEEE (2020)

    Google Scholar 

  20. Hillenbrand, A., Störl, U., Nabiyev, S., Scherzinger, S.: MigCast in Monte Carlo: The Impact of Data Model Evolution in NoSQL Databases. CoRR abs/2104.11787 (2021)

    Google Scholar 

  21. Jampani, R., Xu, F., Wu, M., Perez, L., Jermaine, C., Haas, P.J.: The Monte Carlo database system: stochastic analysis close to the data. ACM TODS 36(3), 1–41 (2011)

    Article  Google Scholar 

  22. Klettke, M., Störl, U., Shenavai, M., Scherzinger, S.: NoSQL schema evolution and big data migration at scale. In: Proceedings of the SCDM 2016. IEEE (2016)

    Google Scholar 

  23. Levandoski, J.J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: Proceedings of the ICDE 2013. IEEE (2013)

    Google Scholar 

  24. MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge, USA (2003)

    Google Scholar 

  25. Meurice, L., Cleve, A.: Supporting schema evolution in schema-less NoSQL data stores. In: Proceedings of the SANER 2017 (2017)

    Google Scholar 

  26. Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of the SIGSOFT 2013. ACM (2013)

    Google Scholar 

  27. Saur, K., Dumitras, T., Hicks, M.W.: Evolving NoSQL databases without downtime. In: Proceedings of the ICSME 2016. IEEE (2016)

    Google Scholar 

  28. Scherzinger, S., Sidortschuck, S.: An empirical study on the design and evolution of NoSQL database schemas. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 441–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_33

    Chapter  Google Scholar 

  29. Stonebraker, M.: My top ten fears about the DBMS field. In: Proceedings of the ICDE 2018. IEEE (2018)

    Google Scholar 

  30. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Synth. Lect. Data Manag. 3(2), 1–180 (2011)

    Article  Google Scholar 

  31. Vassiliadis, P., Zarras, A., Skoulis, I.: Gravitating to rigidity: patterns of schema evolution -and its absence- in the lives of tables. Inf. Syst. 63, 24–46 (2016)

    Google Scholar 

  32. Li, X.: A survey of schema evolution in object-oriented databases. In: Proceedings of the TOOLS 1999. IEEE (1999)

    Google Scholar 

Download references

Acknowledgments

We thank Tobias Kreiter, Shamil Nabiyev, Maksym Levchenko, and Jan-Christopher Mair for their contributions to MigCast.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Hillenbrand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hillenbrand, A., Scherzinger, S., Störl, U. (2021). Remaining in Control of the Impact of Schema Evolution in NoSQL Databases. In: Ghose, A., Horkoff, J., Silva Souza, V.E., Parsons, J., Evermann, J. (eds) Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13011. Springer, Cham. https://doi.org/10.1007/978-3-030-89022-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89022-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89021-6

  • Online ISBN: 978-3-030-89022-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics