Skip to main content

Managing Schema Migration in NoSQL Databases: Advisor Heuristics vs. Self-adaptive Schema Migration Strategies

  • Conference paper
  • First Online:
Model-Driven Engineering and Software Development (MODELSWARD 2021, MODELSWARD 2022)

Abstract

Schema-flexible NoSQL databases are increasingly popular backends in the agile application development as they allow developers to write code assuming a new database schema that is different from the current one. If the application is in production already, non-functional requirements for application performance and cost efficiency are routinely part of service-level agreements (SLAs). Co-evolving the schema with the application code then requires subtle management decisions regarding the migration of variational legacy data that is persisted in the production database. Eventually, project managers have to deal with the repercussions of schema evolution in order to comply with SLAs, especially if stipulated metrics compete with each other in tradeoffs. To this end, we present a NoSQL Schema Migration Advisor that supports the schema migration management in NoSQL databases in two distinct ways: If the migration situation can be elicited, a heuristic is offered to estimate the impact of schema evolution by means of choosing a migration strategy and pace code releases accordingly. If this information is not sufficiently or not readily available, self-adaptive schema migration strategies are presented that can automatically curate variational data such that competing metrics can be balanced out in order to comply with SLAs, if possible, making management interventions superfluous.

This work has been funded by the German Research Foundation (project grant #385808805). We thank Jan-Christopher Mair, Kai Pehns, Tobias Kreiter, Shamil Nabiyev, and Maksym Levchenko from Darmstadt University of Applied Sciences for their contributions to MigCast and Darwin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The MigCast pricing model is specified at USD 0.2 per 1M I/O-Requests and is based on Amazon DocumentDB (AWS) for US-East. It can be viewed at https://aws.amazon.com/en/documentdb/pricing/, visited on February 2, 2022.

  2. 2.

    We have already analyzed possible options of self-adaptation from a theoretical stance in [19, 20] and the prototype in [18].

  3. 3.

    The distribution of the served workload of entity accesses and the distribution and kinds of SMOs are randomized in MigCast within the given bounds as specified, in this case a Pareto-distributed workload of medium intensity and a high multi-type ratio of SMOs. The cost model is chosen as described on page 3. For further details of the implementation setup be referred to [17].

  4. 4.

    Despite the relatively small amounts in our example of an original database instance of 10m entities and just 12 schema changes affecting parts of the database, costs can easily amount to many thousands of USD, increasing exponentially due to many influencing factors [17].

  5. 5.

    The limit values for the complexity-adaptive strategy are in the depicted migration scenarios equivalent to the predictive strategy, because its advantage can only be played out at a higher share of multi-type SMOs and a lesser, Pareto-distributed query workload.

  6. 6.

    The increase is slightly exponentially for a data growth rate of \(10\%\), which increases the number of entities by a constant amount per release; see bottom left column of Table 3.

  7. 7.

    This amount can be considered an upper limit, because the migration costs can be assumed to grow exponentially, such that in the Monte Carlo experiments not the assumed \(50\%\) but \(40\%\) need to be spent at release 6 for lazy due to the Pareto distribution of the entity accesses.

  8. 8.

    The migration can either be done in offline batch processing, or in a blue-green deployment [22], or during a phase of low query workload, then causing higher latency intermittently.

References

  1. 3T Software Labs Ltd.: MongoDB Trends Report. Cambridge, U.K. (2020)

    Google Scholar 

  2. Aulbach, S., Jacobs, D., Kemper, A., Seibold, M.: A comparison of flexible schemas for software as a service. In: Proceedings of SIGMOD 2009. ACM (2009)

    Google Scholar 

  3. Barker, S., Chi, Y., Moon, H.J., Hacigümüş, H., Shenoy, P.: “Cut me some slack” latency-aware live migration for databases. In: Proceedings of EDBT’12 (2012)

    Google Scholar 

  4. Bertino, E., Guerrini, G., Mesiti, M., Tosetto, L.: Evolving a set of DTDs according to a dynamic set of XML documents. In: Proceedings of EDBT’02 Workshops (2002)

    Google Scholar 

  5. Cleve, A., Gobert, M., Meurice, L., Maes, J., Weber, J.: Understanding database schema evolution. Sci. Comput. Programm. 97(P1), January 2015

    Google Scholar 

  6. Conrad, A., Gärtner, S., Störl, U.: Towards automated schema optimization. In: ER Demos and Posters. Proceedings of CEUR Workshop, vol. 2958 (2021)

    Google Scholar 

  7. Curino, C., et al.: Relational cloud: a DbaaS for the cloud. In: Proceedings of CIDR (2011)

    Google Scholar 

  8. Curino, C., Moon, H.J., Deutsch, A., Zaniolo, C.: Automating the database schema evolution process. VLDB J. 22(1), 73–98 (2013)

    Google Scholar 

  9. Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in Wikipedia - toward a web information system benchmark. In: Proceedings of ICEIS 2008 (2008)

    Google Scholar 

  10. Difallah, D.E., Pavlo, A., Curino, C., Cudre-Mauroux, P.: OLTP-bench: an extensible testbed for benchmarking relational databases. Proc. VLDB E 7(4), 277–288 (2013)

    Google Scholar 

  11. Ellison, M., Calinescu, R., Paige, R.F.: Evaluating cloud database migration options using workload models. J. Cloud Comput. 7(1), 1–18 (2018). https://doi.org/10.1186/s13677-018-0108-5

    Article  Google Scholar 

  12. Fahmideh, M., Daneshgar, F., Beydoun, G., Rabhi, F.A.: Challenges in migrating legacy software systems to the cloud. CoRR abs/2004.10724 (2020)

    Google Scholar 

  13. Filho, E.R.L., de Almeida, E.C., Scherzinger, S., Herodotou, H.: Investigating automatic parameter tuning for SQL-on-hadoop systems. Big Data Res. 25 (2021)

    Google Scholar 

  14. Guerrini, G., Mesiti, M., Rossi, D.: Impact of XML schema evolution on valid documents. In: Proceedings of WIDM’05 Workshop. ACM (2005)

    Google Scholar 

  15. Herrmann, K., Voigt, H., Behrend, A., Rausch, J., Lehner, W.: Living in parallel realities: co-existing schema versions. In: Proceedings of SIGMOD (2017)

    Google Scholar 

  16. Hillenbrand, A., Levchenko, M., Störl, U., Scherzinger, S., Klettke, M.: MigCast: Putting a price tag on data model evol. in NoSQL D. S. In: Proceedings of SIGMOD (2019)

    Google Scholar 

  17. Hillenbrand, A., Scherzinger, S., Störl, U.: Remaining in control of the impact of schema evolution in NoSQL databases. In: Proceedings of ER 2021 (2021)

    Google Scholar 

  18. Hillenbrand, A., Störl, U.: Automated curation of variational data in NoSQL databases through metric-driven self-adaptive migration strategies. In: Proceedings of MODELSWARD 2022. SCITEPRESS (2022)

    Google Scholar 

  19. Hillenbrand, A., Störl, U., Levchenko, M., Nabiyev, S., Klettke, M.: Towards self-adapting data migration in the context of schema evolution in NoSQL databases. In: Proceedings of ICDE 2020 Workshops. IEEE (2020)

    Google Scholar 

  20. Hillenbrand, A., Störl, U., Nabiyev, S., Klettke, M.: Self-adapting data migration in the context of schema evolution in NoSQL databases. Distrib. Parallel Databases 40(1), 5–25 (2021). https://doi.org/10.1007/s10619-021-07334-1

    Article  Google Scholar 

  21. Hillenbrand, A., Störl, U., Nabiyev, S., Scherzinger, S.: MigCast in Monte Carlo: the impact of data model evolution in NoSQL databases. CoRR (2021)

    Google Scholar 

  22. Kim, G., Debois, P., Willis, J., Humble, J.: The DevOps Handbook. IT Revolution Press (2016)

    Google Scholar 

  23. Klettke, M., Störl, U., Shenavai, M., Scherzinger, S.: NoSQL schema evolution and big data migration at scale. In: Proceedings of SCDM 2016. IEEE (2016)

    Google Scholar 

  24. Klímek, J., Malý, J., Necaský, M., Holubová, I.: eXolutio: methodology for design and evolution of XML schemas using conceptual mod. Informatica 26(3), 271 (2015)

    Google Scholar 

  25. Levandoski, J.J., Larson, P., Stoica, R.: Identifying hot and cold data in main-memory databases. In: Proceedings of ICDE 2013. IEEE (2013)

    Google Scholar 

  26. Meurice, L., Cleve, A.: Supporting schema evolution in schema-less NoSQL data stores. In: Proceedings of SANER 2017 (2017)

    Google Scholar 

  27. Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: NoSE: schema design for NoSQL applications. IEEE Trans. Knowl. Data Eng. 29, 2275–2289 (2017)

    Google Scholar 

  28. Preuveneers, D., Joosen, W.: Automated configuration of NoSQL performance and scalability tactics for data-intensive applications. Informatics 7, 29 (2020)

    Google Scholar 

  29. Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of SIGSOFT 2013. ACM (2013)

    Google Scholar 

  30. Rijsbergen, C.J.V.: Inf. Retrieval. Butterworth-Heinemann, USA (1979)

    Google Scholar 

  31. Saur, K., Dumitras, T., Hicks, M.W.: Evolving NoSQL databases without downtime. In: Proceedings of ICSME 2016. IEEE (2016)

    Google Scholar 

  32. Scherzinger, S., Klettke, M., Störl, U.: Managing schema evolution in NoSQL data stores. In: Proceedings of DBPL 2013 (2013)

    Google Scholar 

  33. Scherzinger, S., Sidortschuck, S.: An empirical study on the design and evolution of NoSQL database schemas. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 441–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_33

    Chapter  Google Scholar 

  34. Skoulis, I., Vassiliadis, P., Zarras, A.: Growing up with stability: how open-source relational databases evolve. Inf. Syst. 53 (2015)

    Google Scholar 

  35. Störl, U., et al.: Curating variational data in appl. dev. In: Proceedings of ICDE 2018 (2018)

    Google Scholar 

  36. Suárez-Otero, P., Mior, M.J., José Suárez-Cabal, M., Tuya, J.: Maintaining NoSQL database quality during conceptual model evolution. In: IEEE International Conference on Big Data (Big Data) (2020)

    Google Scholar 

  37. Tsoumakos, D., Konstantinou, I., Boumpouka, C., Sioutas, S., Koziris, N.: Automated, elastic resource provisioning for NoSQL clusters using TIRAMOLA. In: CCGrid 2013. IEEE (2013)

    Google Scholar 

  38. Upton, G., Cook, I.: The Oxford Dictionary of Statistics. Oxford University Press, United Kingdom (2002)

    MATH  Google Scholar 

  39. Vassiliadis, P.: Profiles of schema evolution in free open source software projects. In: Proceedings of ICDE 2021. IEEE (2021)

    Google Scholar 

  40. Vassiliadis, P., Zarras, A., Skoulis, I.: Gravitating to rigidity: patterns of schema evolution-and its absence-in the lives of tables. Inf. Syst. 63 (2016)

    Google Scholar 

  41. Zilio, D.C., et al.: DB2 design advisor. In: Proceedings of VLDB (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Uta Störl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hillenbrand, A., Störl, U. (2023). Managing Schema Migration in NoSQL Databases: Advisor Heuristics vs. Self-adaptive Schema Migration Strategies. In: Pires, L.F., Hammoudi, S., Seidewitz, E. (eds) Model-Driven Engineering and Software Development. MODELSWARD MODELSWARD 2021 2022. Communications in Computer and Information Science, vol 1708. Springer, Cham. https://doi.org/10.1007/978-3-031-38821-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-38821-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-38820-0

  • Online ISBN: 978-3-031-38821-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics