Skip to main content

An Empirical Study of (Multi-) Database Models in Open-Source Projects

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13011))

Abstract

Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The detailed results are publicly available in a replication package [3].

  2. 2.

    Libraries.io - https://libraries.io/.

  3. 3.

    Octoverse - https://octoverse.github.com/.

  4. 4.

    DB-Engines Ranking - https://db-engines.com/en/ranking.

  5. 5.

    ORCID Source - https://github.com/ORCID/ORCID-Source.

References

  1. Anderson, D., Hills, M.: Supporting analysis of SQL queries in PHP AiR. In: SCAM 2017, pp. 153–158. IEEE (2017)

    Google Scholar 

  2. Basciani, F., Rocco, J.D., Ruscio, D.D., Pierantonio, A., Iovino, L.: TyphonML: a modeling environment to develop hybrid polystores. In: MODELS 2020, pp. 2:1–2:5 (2020)

    Google Scholar 

  3. Benats, P.: Repl. pkg. https://github.com/benatspo/Multi-database_Models

  4. Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD 2007, pp. 1–12. ACM (2007)

    Google Scholar 

  5. Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P.: Don’t touch my code! examining the effects of ownership on software quality. In: ESEC/FSE 2011, pp. 4–14. ACM (2011)

    Google Scholar 

  6. Borges, H., Tulio Valente, M.: What’s in a GitHub star? Understanding repository starring practices in a social coding platform. JSS 146, 112–129 (2018)

    Google Scholar 

  7. Cleve, A., Gobert, M., Meurice, L., Maes, J., Weber, J.: Understanding database schema evolution: a case study. Sci. Comput. Program. 97, 113–121 (2015)

    Article  Google Scholar 

  8. Davoudian, A., Chen, L., Liu, M.: A survey on NoSQL stores. ACM Comput. Surv. 51, 1–43 (2018)

    Article  Google Scholar 

  9. Decan, A., Goeminne, M., Mens, T.: On the interaction of relational database access technologies in open source Java projects. In: SATTOSE 2015, pp. 26–35 (2015)

    Google Scholar 

  10. Decan, A., Mens, T., Grosjean, P.: An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir. Softw. Eng. 24(1), 381–416 (2018). https://doi.org/10.1007/s10664-017-9589-y

    Article  Google Scholar 

  11. Dimolikas, K., Zarras, A.V., Vassiliadis, P.: A study on the effect of a table’s involvement in foreign keys to its schema evolution. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 456–470. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_34

    Chapter  Google Scholar 

  12. Fink, J., Gobert, M., Cleve, A.: Adapting queries to database schema changes in hybrid polystores. In: SCAM 2020, pp. 127–131 (2020)

    Google Scholar 

  13. Gobert, M.: Schema evolution in hybrid databases systems. In: VLDB 2020 (2020)

    Google Scholar 

  14. Goeminne, M., Mens, T.: Towards a survival analysis of database framework usage in Java projects. In: ICSME 2015, pp. 551–555 (2015)

    Google Scholar 

  15. Jovanovic, P., Nadal, S., Romero, O., Abelló, A., Bilalli, B.: Quarry: a user-centered big data integration platform. Inf. Syst. Front. 23, 9–33 (2021)

    Article  Google Scholar 

  16. Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining GitHub. In: MSR 2014, pp. 92–101. ACM (2014)

    Google Scholar 

  17. Li, B., Poshyvanyk, D., Grechanik, M.: Automatically detecting integrity violations in database-centric applications. In: ICPC 2017, pp. 251–262. IEEE (2017)

    Google Scholar 

  18. Linares-Vásquez, M., Li, B., Vendome, C., Poshyvanyk, D.: Documenting database usages and schema constraints in database-centric applications. In: ISSTA 2016, pp. 270–281 (2016)

    Google Scholar 

  19. Meurice, L., Nagy, C., Cleve, A.: Detecting and preventing program inconsistencies under database schema evolution. In: QRS 2016, pp. 262–273. IEEE (2016)

    Google Scholar 

  20. Munaiah, N., Kroh, S., Cabrey, C., Nagappan, M.: Curating GitHub for engineered software projects. Empir. Softw. Eng. 22(6), 3219–3253 (2017). https://doi.org/10.1007/s10664-017-9512-6

    Article  Google Scholar 

  21. Muse, B.A., Rahman, M.M., Nagy, C., Cleve, A., Khomh, F., Antoniol, G.: On the prevalence, impact, and evolution of SQL code smells in data-intensive systems. In: MSR 2020, pp. 327–338 (2020)

    Google Scholar 

  22. Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: ESEC/FSE 2013, pp. 125–135 (2013)

    Google Scholar 

  23. Ringlstetter, A., Scherzinger, S., Bissyandé, T.F.: Data model evolution using object-NoSQL mappers: folklore or state-of-the-art? In: 2nd International Workshop on BIG Data Software Engineering, pp. 33–36 (2016)

    Google Scholar 

  24. Scherzinger, S., Sidortschuck, S.: An empirical study on the design and evolution of NoSQL database schemas. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 441–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_33

    Chapter  Google Scholar 

  25. Shao, S., et al.: Database-access performance antipatterns in database-backed web applications. In: ICSME 2020, pp. 58–69. IEEE (2020)

    Google Scholar 

  26. Sjøberg, D.: Quantifying schema evolution. Inf. Softw. Technol. 35(1), 35–44 (1993)

    Article  Google Scholar 

  27. Stonebraker, M., Deng, D., Brodie, M.L.: Database decay and how to avoid it. In: Proceedings of Big Data (2016)

    Google Scholar 

  28. Störl, U., Hauf, T., Klettke, M., Scherzinger, S.: Schemaless NoSQL data stores-Object-NoSQL Mappers to the rescue? In: BTW 2015 (2015)

    Google Scholar 

  29. Sun, Z., Liu, Y., Cheng, Z., Yang, C., Che, P.: Req2Lib: a semantic neural model for software library recommendation. In: SANER 2020, pp. 542–546 (2020)

    Google Scholar 

  30. Yamamoto, K., Kondo, M., Nishiura, K., Mizuno, O.: Which metrics should researchers use to collect repositories: an empirical study. In: QRS 2020, pp. 458–466 (2020)

    Google Scholar 

Download references

Acknowledgments

This research is supported by the F.R.S.-FNRS and FWO EOS project 30446992 SECO-ASSIST and the SNF-FNRS project INSTINCT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pol Benats .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Benats, P., Gobert, M., Meurice, L., Nagy, C., Cleve, A. (2021). An Empirical Study of (Multi-) Database Models in Open-Source Projects. In: Ghose, A., Horkoff, J., Silva Souza, V.E., Parsons, J., Evermann, J. (eds) Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13011. Springer, Cham. https://doi.org/10.1007/978-3-030-89022-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89022-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89021-6

  • Online ISBN: 978-3-030-89022-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics