Abstract
Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The detailed results are publicly available in a replication package [3].
- 2.
Libraries.io - https://libraries.io/.
- 3.
Octoverse - https://octoverse.github.com/.
- 4.
DB-Engines Ranking - https://db-engines.com/en/ranking.
- 5.
ORCID Source - https://github.com/ORCID/ORCID-Source.
References
Anderson, D., Hills, M.: Supporting analysis of SQL queries in PHP AiR. In: SCAM 2017, pp. 153–158. IEEE (2017)
Basciani, F., Rocco, J.D., Ruscio, D.D., Pierantonio, A., Iovino, L.: TyphonML: a modeling environment to develop hybrid polystores. In: MODELS 2020, pp. 2:1–2:5 (2020)
Benats, P.: Repl. pkg. https://github.com/benatspo/Multi-database_Models
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD 2007, pp. 1–12. ACM (2007)
Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P.: Don’t touch my code! examining the effects of ownership on software quality. In: ESEC/FSE 2011, pp. 4–14. ACM (2011)
Borges, H., Tulio Valente, M.: What’s in a GitHub star? Understanding repository starring practices in a social coding platform. JSS 146, 112–129 (2018)
Cleve, A., Gobert, M., Meurice, L., Maes, J., Weber, J.: Understanding database schema evolution: a case study. Sci. Comput. Program. 97, 113–121 (2015)
Davoudian, A., Chen, L., Liu, M.: A survey on NoSQL stores. ACM Comput. Surv. 51, 1–43 (2018)
Decan, A., Goeminne, M., Mens, T.: On the interaction of relational database access technologies in open source Java projects. In: SATTOSE 2015, pp. 26–35 (2015)
Decan, A., Mens, T., Grosjean, P.: An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir. Softw. Eng. 24(1), 381–416 (2018). https://doi.org/10.1007/s10664-017-9589-y
Dimolikas, K., Zarras, A.V., Vassiliadis, P.: A study on the effect of a table’s involvement in foreign keys to its schema evolution. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 456–470. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_34
Fink, J., Gobert, M., Cleve, A.: Adapting queries to database schema changes in hybrid polystores. In: SCAM 2020, pp. 127–131 (2020)
Gobert, M.: Schema evolution in hybrid databases systems. In: VLDB 2020 (2020)
Goeminne, M., Mens, T.: Towards a survival analysis of database framework usage in Java projects. In: ICSME 2015, pp. 551–555 (2015)
Jovanovic, P., Nadal, S., Romero, O., Abelló, A., Bilalli, B.: Quarry: a user-centered big data integration platform. Inf. Syst. Front. 23, 9–33 (2021)
Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining GitHub. In: MSR 2014, pp. 92–101. ACM (2014)
Li, B., Poshyvanyk, D., Grechanik, M.: Automatically detecting integrity violations in database-centric applications. In: ICPC 2017, pp. 251–262. IEEE (2017)
Linares-Vásquez, M., Li, B., Vendome, C., Poshyvanyk, D.: Documenting database usages and schema constraints in database-centric applications. In: ISSTA 2016, pp. 270–281 (2016)
Meurice, L., Nagy, C., Cleve, A.: Detecting and preventing program inconsistencies under database schema evolution. In: QRS 2016, pp. 262–273. IEEE (2016)
Munaiah, N., Kroh, S., Cabrey, C., Nagappan, M.: Curating GitHub for engineered software projects. Empir. Softw. Eng. 22(6), 3219–3253 (2017). https://doi.org/10.1007/s10664-017-9512-6
Muse, B.A., Rahman, M.M., Nagy, C., Cleve, A., Khomh, F., Antoniol, G.: On the prevalence, impact, and evolution of SQL code smells in data-intensive systems. In: MSR 2020, pp. 327–338 (2020)
Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: ESEC/FSE 2013, pp. 125–135 (2013)
Ringlstetter, A., Scherzinger, S., Bissyandé, T.F.: Data model evolution using object-NoSQL mappers: folklore or state-of-the-art? In: 2nd International Workshop on BIG Data Software Engineering, pp. 33–36 (2016)
Scherzinger, S., Sidortschuck, S.: An empirical study on the design and evolution of NoSQL database schemas. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 441–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_33
Shao, S., et al.: Database-access performance antipatterns in database-backed web applications. In: ICSME 2020, pp. 58–69. IEEE (2020)
Sjøberg, D.: Quantifying schema evolution. Inf. Softw. Technol. 35(1), 35–44 (1993)
Stonebraker, M., Deng, D., Brodie, M.L.: Database decay and how to avoid it. In: Proceedings of Big Data (2016)
Störl, U., Hauf, T., Klettke, M., Scherzinger, S.: Schemaless NoSQL data stores-Object-NoSQL Mappers to the rescue? In: BTW 2015 (2015)
Sun, Z., Liu, Y., Cheng, Z., Yang, C., Che, P.: Req2Lib: a semantic neural model for software library recommendation. In: SANER 2020, pp. 542–546 (2020)
Yamamoto, K., Kondo, M., Nishiura, K., Mizuno, O.: Which metrics should researchers use to collect repositories: an empirical study. In: QRS 2020, pp. 458–466 (2020)
Acknowledgments
This research is supported by the F.R.S.-FNRS and FWO EOS project 30446992 SECO-ASSIST and the SNF-FNRS project INSTINCT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Benats, P., Gobert, M., Meurice, L., Nagy, C., Cleve, A. (2021). An Empirical Study of (Multi-) Database Models in Open-Source Projects. In: Ghose, A., Horkoff, J., Silva Souza, V.E., Parsons, J., Evermann, J. (eds) Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13011. Springer, Cham. https://doi.org/10.1007/978-3-030-89022-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-89022-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89021-6
Online ISBN: 978-3-030-89022-3
eBook Packages: Computer ScienceComputer Science (R0)