Using Docker Containers to Improve Reproducibility in Software and Web Engineering Research

Cito, Jürgen; Ferme, Vincenzo; Gall, Harald C.

doi:10.1007/978-3-319-38791-8_58

Jürgen Cito¹⁶,
Vincenzo Ferme¹⁷ &
Harald C. Gall¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9671))

Included in the following conference series:

International Conference on Web Engineering

4595 Accesses
16 Citations

Abstract

The ability to replicate and reproduce scientific results has become an increasingly important topic for many academic disciplines. In computer science and, more specifically, software and web engineering, contributions of scientific work rely on developed algorithms, tools and prototypes, quantitative evaluations, and other computational analyses. Published code and data come with many undocumented assumptions, dependencies, and configurations that are internal knowledge and make reproducibility hard to achieve. This tutorial presents how Docker containers can overcome these issues and aid the reproducibility of research artifacts in software and web engineering and discusses their applications in the field.

You have full access to this open access chapter, Download conference paper PDF

Open Science in Software Engineering

Publish or perish, but do not forget your software artifacts

Article Open access 08 October 2020

Robert Heumüller, Sebastian Nielebock, … Frank Ortmeier

Containers in Software Development: A Systematic Mapping Study

Keywords

1 Motivation

Reproducibility can be described as the repeatability of a certain process in order to establish a fact or the conditions under which we are able to observe the same fact [1]. The ability to replicate and reproduce scientific results has become an increasingly important topic for many academic disciplines. In computer science and, more specifically, software and web engineering (SE/WE), contributions of scientific work rely on developed algorithms, tools and prototypes, quantitative evaluations, and other computational analyses.

However, even if code and data are published alongside the paper as open source artifacts, they come with many undocumented assumptions, dependencies, and configurations that make reproducibility hard to achieve [2]. Reproduction of results often requires internal knowledge that is missing from the published manuscript.

Docker container [3] is an open source technology that can address the issues of reproducibility in SE/WE research. Containers can be seen as lightweight virtual machines that allow to set up a computational environment, including all necessary dependencies (e.g., libraries), configuration, code and data needed, within a single unit (called image). The steps necessary to achieve the state in such an image are documented within a Dockerfile, a script that holds all infrastructure configuration and commands. Images can be distributed publicly and seamlessly run on Linux, and also have support for major operating systems through Docker machine. The major difference to virtual machines is that Docker images share the kernel with the underlying host machine, which enables much smaller image sizes and higher performance. This has made Docker particularly attractive to industry and has thus seen a steep rise in adoption of the technology [4, 5].

Containers address the shortcomings of previous approaches (e.g., open sourcing) and make artifacts in SE/WE research immediately usable to reviewers, interested readers, and future researchers and improves dissemination of scientific results.

This tutorial aims on giving a hands-on introduction to Docker, and show how researchers can package an existing research project in the SE/WE community within a Docker container.

2 Importance to the Web Engineering Community

In recent years, software and web engineering conferences have started to encourage the submission of artifacts that support replication (e.g., replication packages at FSE^{Footnote 1}, data showcase at MSR^{Footnote 2}), signaling the importance of reproducibility in the field.

Reproducibility can be further improved if all artifacts belonging to a paper are packaged and documented in Docker containers. This allows others to immediately make use of the package without the need of internal knowledge and without dependency issues.

This tutorial will offer an opportunity to familiarize the audience with how Docker containers work and how SE/WE researchers can leverage this technology to provide a reproducible package to their own research. More specifically, it will give a hands-on tutorial on how existing prototypes can be packaged to form a reproducible entity.

3 Outline

The tutorial is supposed to take half a day (3 h). It will first introduce the basics of container technology, how it differs to virtual machines, and why it has gained widespread attraction in industry. It will then convey the basic building blocks of how an image can be constructed. In addition, it will give guidance on how to best produce a Dockerfile out of working containers. It will then continue to apply these basic techniques to a specific use case in the Web Engineering domain. The tutorial will conclude with a discussion on the advantages, challenges, and limitations of the use of containers to enable reproducibility in SE/WE research. The detailed outline of the tutorial is described in the following.

1. Introduction to Containers and Reproducibility of SE/WE Research. The tutorial will start by introducing the term reproducibility in relation to SE/WE research. It will continue to introduce container technologies and how they can help with reproducibility.
2. Docker Container Basics. The tutorial will cover a short overview of the Docker ecosystem and will introduce the basic building blocks and its tooling. This block in the tutorial will also walk through the process and concrete instructions necessary to build an initial container.
3. Web Engineering Use Case. This part of the tutorial will walk through a concrete use case that could be found in web engineering. The use case is based on a distributed, real-time node.js application, realized by multiple services. The concrete instructions to construct the Docker image will be elaborated along the way.
4. Open Challenges and Limitations. We conclude the tutorial with a discussion on the open challenges that still remain in the area of reproducibility, what kind of limitations exist.

All materials covered in this tutorial, including all scripts and resulting artifacts, will be made available online at:

http://www.ifi.uzh.ch/seal/people/cito.html.

4 Target Audience

This tutorial is suitable for both academic researchers and industry professionals that want to learn more about Docker containers and reproducibility in general. No prior knowledge of Docker or any other container technology is necessary. To follow along with the instructions, we assume basic skills in working with the Linux console (e.g., bash). The audience will be pointed to further material, for those who want to learn more about container technologies.

5 About the Organizers

The material to be included in the tutorial is authored by Jürgen Cito, Vincenzo Ferme, and Harald C. Gall.

Jürgen Cito is a Ph.D. candidate at the University of Zurich, Switzerland. In his research, he investigates the intersection between software engineering and cloud computing. In the summer of 2015, he was a research intern at the IBM T.J. Watson Research Center in New York, where he worked on cloud analytics based on Docker containers. That year he also won the local Docker Hackathon in New York City with the project docker-record ^{Footnote 3}.

More information is available at: http://www.ifi.uzh.ch/seal/people/cito.html.

Vincenzo Ferme is a Ph.D. candidate at the University of Lugano (USI), Switzerland. In his research, he is involved in the BenchFlow Project. The goal of the project is to design the first benchmark for assessing and comparing the performance of workflow management systems. In the context of the project, he is developing a framework for automated software performance benchmarking that largely relies on Docker^{Footnote 4}.

More information is available at: http://www.vincenzoferme.it.

Harald C. Gall is a professor of software engineering in the Department of Informatics at the University of Zurich, Switzerland. His research interests include software engineering, focusing on software evolution, software quality analysis, software architecture, reengineering, collaborative software engineering, and service centric software systems. He was the program chair of the European Software Engineering Conference and the ACM SIGSOFT ESEC-FSE in 2005 and the program co-chair of ICSE 2011.

More information is available at: http://www.ifi.uzh.ch/seal/people/gall.html.

Notes

References

Mockus, A., Anda, B., Sjøberg, D.I.: Experiences from replicating a case study to investigate reproducibility of software development
Google Scholar
Boettiger, C.: An introduction to docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 49(1), 71–79 (2015)
Article Google Scholar
Merkel, D.: Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014(239), 2 (2014)
Google Scholar
Gerber, A.: The state of containers and the docker ecosystem: 2015. Technical report, White paper
Google Scholar
Cito, J., Leitner, P., Fritz, T., Gall, H.C.: The making of cloud applications: an empirical study on software development for the cloud. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 393–403. ACM, New York (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Zurich, Zurich, Switzerland
Jürgen Cito & Harald C. Gall
University of Lugano (USI), Lugano, Switzerland
Vincenzo Ferme

Authors

Jürgen Cito
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Ferme
View author publications
You can also search for this author in PubMed Google Scholar
Harald C. Gall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jürgen Cito .

Editor information

Editors and Affiliations

Dept. of Software & Computer Technology, Delft Univ. of Technology, Delft, Zuid-Holland, The Netherlands
Alessandro Bozzon
Department of Informatics, University of Fribourg, Fribourg, Switzerland
Philippe Cudre-Maroux
Faculty of Informatics, Università della Svizzera italiana (USI), Lugano, Switzerland
Cesare Pautasso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cito, J., Ferme, V., Gall, H.C. (2016). Using Docker Containers to Improve Reproducibility in Software and Web Engineering Research. In: Bozzon, A., Cudre-Maroux, P., Pautasso, C. (eds) Web Engineering. ICWE 2016. Lecture Notes in Computer Science(), vol 9671. Springer, Cham. https://doi.org/10.1007/978-3-319-38791-8_58

Download citation

DOI: https://doi.org/10.1007/978-3-319-38791-8_58
Published: 25 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38790-1
Online ISBN: 978-3-319-38791-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Docker Containers to Improve Reproducibility in Software and Web Engineering Research

Abstract

Similar content being viewed by others

Open Science in Software Engineering

Publish or perish, but do not forget your software artifacts

Containers in Software Development: A Systematic Mapping Study

Keywords

1 Motivation

2 Importance to the Web Engineering Community

3 Outline

4 Target Audience

5 About the Organizers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Using Docker Containers to Improve Reproducibility in Software and Web Engineering Research

Abstract

Similar content being viewed by others

Open Science in Software Engineering

Publish or perish, but do not forget your software artifacts

Containers in Software Development: A Systematic Mapping Study

Keywords

1 Motivation

2 Importance to the Web Engineering Community

3 Outline

4 Target Audience

5 About the Organizers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation