An Empirical Study of (Multi-) Database Models in Open-Source Projects

Benats, Pol; Gobert, Maxime; Meurice, Loup; Nagy, Csaba; Cleve, Anthony

doi:10.1007/978-3-030-89022-3_8

Pol Benats¹³,
Maxime Gobert¹³,
Loup Meurice¹³,
Csaba Nagy¹⁴ &
…
Anthony Cleve¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13011))

Included in the following conference series:

International Conference on Conceptual Modeling

1479 Accesses
1 Altmetric

Abstract

Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono-database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analyzing the adoption of database management systems throughout the history of open source projects

Article 22 February 2025

FIXME: synchronize with database! An empirical study of data access self-admitted technical debt

Article 08 July 2022

Mining, Analyzing, and Evolving Data-Intensive Software Ecosystems

Notes

1.
The detailed results are publicly available in a replication package [3].
2.
Libraries.io - https://libraries.io/.
3.
Octoverse - https://octoverse.github.com/.
4.
DB-Engines Ranking - https://db-engines.com/en/ranking.
5.
ORCID Source - https://github.com/ORCID/ORCID-Source.

References

Anderson, D., Hills, M.: Supporting analysis of SQL queries in PHP AiR. In: SCAM 2017, pp. 153–158. IEEE (2017)
Google Scholar
Basciani, F., Rocco, J.D., Ruscio, D.D., Pierantonio, A., Iovino, L.: TyphonML: a modeling environment to develop hybrid polystores. In: MODELS 2020, pp. 2:1–2:5 (2020)
Google Scholar
Benats, P.: Repl. pkg. https://github.com/benatspo/Multi-database_Models
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD 2007, pp. 1–12. ACM (2007)
Google Scholar
Bird, C., Nagappan, N., Murphy, B., Gall, H., Devanbu, P.: Don’t touch my code! examining the effects of ownership on software quality. In: ESEC/FSE 2011, pp. 4–14. ACM (2011)
Google Scholar
Borges, H., Tulio Valente, M.: What’s in a GitHub star? Understanding repository starring practices in a social coding platform. JSS 146, 112–129 (2018)
Google Scholar
Cleve, A., Gobert, M., Meurice, L., Maes, J., Weber, J.: Understanding database schema evolution: a case study. Sci. Comput. Program. 97, 113–121 (2015)
Article Google Scholar
Davoudian, A., Chen, L., Liu, M.: A survey on NoSQL stores. ACM Comput. Surv. 51, 1–43 (2018)
Article Google Scholar
Decan, A., Goeminne, M., Mens, T.: On the interaction of relational database access technologies in open source Java projects. In: SATTOSE 2015, pp. 26–35 (2015)
Google Scholar
Decan, A., Mens, T., Grosjean, P.: An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir. Softw. Eng. 24(1), 381–416 (2018). https://doi.org/10.1007/s10664-017-9589-y
Article Google Scholar
Dimolikas, K., Zarras, A.V., Vassiliadis, P.: A study on the effect of a table’s involvement in foreign keys to its schema evolution. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 456–470. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_34
Chapter Google Scholar
Fink, J., Gobert, M., Cleve, A.: Adapting queries to database schema changes in hybrid polystores. In: SCAM 2020, pp. 127–131 (2020)
Google Scholar
Gobert, M.: Schema evolution in hybrid databases systems. In: VLDB 2020 (2020)
Google Scholar
Goeminne, M., Mens, T.: Towards a survival analysis of database framework usage in Java projects. In: ICSME 2015, pp. 551–555 (2015)
Google Scholar
Jovanovic, P., Nadal, S., Romero, O., Abelló, A., Bilalli, B.: Quarry: a user-centered big data integration platform. Inf. Syst. Front. 23, 9–33 (2021)
Article Google Scholar
Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining GitHub. In: MSR 2014, pp. 92–101. ACM (2014)
Google Scholar
Li, B., Poshyvanyk, D., Grechanik, M.: Automatically detecting integrity violations in database-centric applications. In: ICPC 2017, pp. 251–262. IEEE (2017)
Google Scholar
Linares-Vásquez, M., Li, B., Vendome, C., Poshyvanyk, D.: Documenting database usages and schema constraints in database-centric applications. In: ISSTA 2016, pp. 270–281 (2016)
Google Scholar
Meurice, L., Nagy, C., Cleve, A.: Detecting and preventing program inconsistencies under database schema evolution. In: QRS 2016, pp. 262–273. IEEE (2016)
Google Scholar
Munaiah, N., Kroh, S., Cabrey, C., Nagappan, M.: Curating GitHub for engineered software projects. Empir. Softw. Eng. 22(6), 3219–3253 (2017). https://doi.org/10.1007/s10664-017-9512-6
Article Google Scholar
Muse, B.A., Rahman, M.M., Nagy, C., Cleve, A., Khomh, F., Antoniol, G.: On the prevalence, impact, and evolution of SQL code smells in data-intensive systems. In: MSR 2020, pp. 327–338 (2020)
Google Scholar
Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: ESEC/FSE 2013, pp. 125–135 (2013)
Google Scholar
Ringlstetter, A., Scherzinger, S., Bissyandé, T.F.: Data model evolution using object-NoSQL mappers: folklore or state-of-the-art? In: 2nd International Workshop on BIG Data Software Engineering, pp. 33–36 (2016)
Google Scholar
Scherzinger, S., Sidortschuck, S.: An empirical study on the design and evolution of NoSQL database schemas. In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds.) ER 2020. LNCS, vol. 12400, pp. 441–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62522-1_33
Chapter Google Scholar
Shao, S., et al.: Database-access performance antipatterns in database-backed web applications. In: ICSME 2020, pp. 58–69. IEEE (2020)
Google Scholar
Sjøberg, D.: Quantifying schema evolution. Inf. Softw. Technol. 35(1), 35–44 (1993)
Article Google Scholar
Stonebraker, M., Deng, D., Brodie, M.L.: Database decay and how to avoid it. In: Proceedings of Big Data (2016)
Google Scholar
Störl, U., Hauf, T., Klettke, M., Scherzinger, S.: Schemaless NoSQL data stores-Object-NoSQL Mappers to the rescue? In: BTW 2015 (2015)
Google Scholar
Sun, Z., Liu, Y., Cheng, Z., Yang, C., Che, P.: Req2Lib: a semantic neural model for software library recommendation. In: SANER 2020, pp. 542–546 (2020)
Google Scholar
Yamamoto, K., Kondo, M., Nishiura, K., Mizuno, O.: Which metrics should researchers use to collect repositories: an empirical study. In: QRS 2020, pp. 458–466 (2020)
Google Scholar

Download references

Acknowledgments

This research is supported by the F.R.S.-FNRS and FWO EOS project 30446992 SECO-ASSIST and the SNF-FNRS project INSTINCT.

Author information

Authors and Affiliations

Namur Digital Institute, University of Namur, Namur, Belgium
Pol Benats, Maxime Gobert, Loup Meurice & Anthony Cleve
Software Institute, Università della Svizzera italiana, Lugano, Switzerland
Csaba Nagy

Authors

Pol Benats
View author publications
You can also search for this author in PubMed Google Scholar
Maxime Gobert
View author publications
You can also search for this author in PubMed Google Scholar
Loup Meurice
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Nagy
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Cleve
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pol Benats .

Editor information

Editors and Affiliations

School of Computing and IT, University of Wollongong, Wollongong, NSW, Australia
Aditya Ghose
Department of Computer Science and Engineering, Chalmers | University of Gothenburg, Gothenburg, Sweden
Jennifer Horkoff
Universidade Federal do Espírito Santo, Vitória, Brazil
Vítor E. Silva Souza
Faculty of Business Administration, Memorial University of Newfoundland, St. John's, NL, Canada
Jeffrey Parsons
Faculty of Business Administration, Memorial University of Newfoundland, St. John's, NL, Canada
Joerg Evermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benats, P., Gobert, M., Meurice, L., Nagy, C., Cleve, A. (2021). An Empirical Study of (Multi-) Database Models in Open-Source Projects. In: Ghose, A., Horkoff, J., Silva Souza, V.E., Parsons, J., Evermann, J. (eds) Conceptual Modeling. ER 2021. Lecture Notes in Computer Science(), vol 13011. Springer, Cham. https://doi.org/10.1007/978-3-030-89022-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-89022-3_8
Published: 16 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89021-6
Online ISBN: 978-3-030-89022-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics