skip to main content
research-article

What Quality Aspects Influence the Adoption of Docker Images?

Published: 30 September 2023 Publication History

Abstract

Docker is a containerization technology that allows developers to ship software applications along with their dependencies in Docker images. Developers can extend existing images using them as base images when writing Dockerfiles. However, a lot of alternative functionally equivalent base images are available. Although many studies define and evaluate quality features that can be extracted from Docker artifacts, the criteria on which developers choose a base image over another remain unclear.
In this article, we aim to fill this gap. First, we conduct a literature review through which we define a taxonomy of quality features, identifying two main groups: configuration-related features (i.e., mainly related to the Dockerfile and image build process), and externally observable features (i.e., what the Docker image users can observe). Second, we ran an empirical study considering the developers’ preference for 2,441 Docker images in 1,911 open source software projects. We want to understand how the externally observable features influence the developers’ preferences, and how they are related to the configuration-related features. Our results pave the way to the definition of a reliable quality measure for Docker artifacts, along with tools that support developers for a quality-aware development of them.

References

[1]
GitHub. 2015. Hadolint: Dockerfile Linter, Validate Inline Bash, Written in Haskell. Retrieved June 2, 2022 from https://github.com/hadolint/hadolint.
[2]
Babak Amin Azad, Pierre Laperdrix, and Nick Nikiforakis. 2019. Less is more: Quantifying the security benefits of debloating web applications. In Proceedings of the 28th USENIX Security Symposium (USENIX Security’19). 1697–1714.
[3]
Hideaki Azuma, Shinsuke Matsumoto, Yasutaka Kamei, and Shinji Kusumoto. 2022. An empirical study on self-admitted technical debt in Dockerfiles. Empirical Software Engineering 27, 2 (2022), 1–26.
[4]
Antonio Brogi, Davide Neri, and Jacopo Soldani. 2017. DockerFinder: Multi-attribute search of Docker images. In Proceedings of the 2017 IEEE International Conference on Cloud Engineering (IC2E’17). IEEE, Los Alamitos, CA, 273–278.
[5]
Jürgen Cito, Gerald Schermann, John Erik Wittern, Philipp Leitner, Sali Zumberi, and Harald C. Gall. 2017. GitHub. In Proceedings of the 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR’17). IEEE, Los Alamitos, CA, 323–333.
[6]
Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling projects in GitHub for MSR studies. In Proceedings of the 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR’21). IEEE, Los Alamitos, CA, 560–564.
[7]
Paul D. Ellis. 2010. The Essential Guide to Effect Sizes: Statistical Power, Meta-analysis, and the Interpretation of Research Results. Cambridge University Press.
[8]
Kalvin Eng and Abram Hindle. 2021. Revisiting Dockerfiles in open source software over time. In Proceedings of the 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR’21). IEEE, Los Alamitos, CA, 449–459.
[9]
Andrew Gelman and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
[10]
Sara Gholami, Hamzeh Khazaei, and Cor-Paul Bezemer. 2021. Should you upgrade official Docker Hub images in production environments? In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER’21). IEEE, Los Alamitos, CA, 101–105.
[11]
Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, and Thomas Reps. 2020. A dataset of Dockerfiles. In Proceedings of the 17th International Conference on Mining Software Repositories. 528–532.
[12]
Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, and Thomas Reps. 2020. Learning from, understanding, and supporting devops artifacts for Docker. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20). IEEE, Los Alamitos, CA, 38–49.
[13]
Jordan Henkel, Denini Silva, Leopoldo Teixeira, Marcelo d’Amorim, and Thomas Reps. 2021. Shipwright: A human-in-the-loop system for Dockerfile repair. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE’21). IEEE, Los Alamitos, CA, 1148–1160.
[14]
Zhuo Huang, Song Wu, Song Jiang, and Hai Jin. 2019. FastBuild: Accelerating Docker image building for efficient development and deployment of container. In Proceedings of the 2019 35th Symposium on Mass Storage Systems and Technologies (MSST’19). IEEE, Los Alamitos, CA, 28–37.
[15]
Md. Hasan Ibrahim, Mohammed Sayagh, and Ahmed E. Hassan. 2020. Too many images on DockerHub! How different are images for the same system? Empirical Software Engineering 25, 5 (2020), 4250–4281.
[16]
Shinya Kitajima and Atsuji Sekiguchi. 2020. Latest image recommendation method for automatic base image update in Dockerfile. In Proceedings of the International Conference on Service-Oriented Computing. 547–562.
[17]
Barbara Kitchenham and Pearl Brereton. 2013. A systematic review of systematic review process research in software engineering. Information and Software Technology 55, 12 (2013), 2049–2075.
[18]
Emna Ksontini, Marouane Kessentini, Thiago do N. Ferreira, and Foyzul Hassan. 2021. Refactorings and technical debt in Docker projects: An empirical study. In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). IEEE, Los Alamitos, CA, 781–791.
[19]
Changyuan Lin, Sarah Nadi, and Hamzeh Khazaei. 2020. A large-scale data set and an empirical study of Docker images hosted on Docker Hub. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME’20). IEEE, Los Alamitos, CA, 371–381.
[20]
Peiyu Liu, Shouling Ji, Lirong Fu, Kangjie Lu, Xuhong Zhang, Wei-Han Lee, Tao Lu, Wenzhi Chen, and Raheem Beyah. 2020. Understanding the security risks of Docker Hub. In Proceedings of the European Symposium on Research in Computer Security. 257–276.
[21]
Francesco Lomio, Emanuele Iannone, Andrea De Lucia, Fabio Palomba, and Valentina Lenarduzzi. 2022. Just-in-time software vulnerability detection: Are we there yet? Journal of Systems and Software 188 (2022), 111283.
[22]
Zhigang Lu, Jiwei Xu, Yuewen Wu, Tao Wang, and Tao Huang. 2019. An empirical case study on the temporary file smell in Dockerfiles. IEEE Access 7 (2019), 63650–63659.
[23]
Antony Martin, Simone Raponi, Théo Combe, and Roberto Di Pietro. 2018. Docker ecosystem—Vulnerability analysis. Computer Communications 122 (2018), 30–43.
[24]
Vaibhav Rastogi, Drew Davidson, Lorenzo De Carli, Somesh Jha, and Patrick McDaniel. 2017. Cimplifier: Automatically debloating containers. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 476–486.
[25]
Giovanni Rosa, Simone Scalabrino, Gabriele Bavota, and Rocco Oliveto. 2023. Replication Package. Figshare.
[26]
John Ruscio. 2008. A probability-based measure of effect size: Robustness to base rates and other factors. Psychological Methods 13, 1 (2008), 19.
[27]
Gerald Schermann, Sali Zumberi, and Jürgen Cito. 2018. Structured information on state and evolution of Dockerfiles on GitHub. In Proceedings of the 15th International Conference on Mining Software Repositories. 26–29.
[28]
Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason Osborne. 2011. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Transactions on Software Engineering 37 (112011), 772–787.
[29]
Rui Shu, Xiaohui Gu, and William Enck. 2017. A study of security vulnerabilities on Docker Hub. In Proceedings of the 7th ACM Conference on Data and Application Security and Privacy. 269–280.
[30]
Dimitris Skourtis, Lukas Rupprecht, Vasily Tarasov, and Nimrod Megiddo. 2019. Carving perfect layers out of Docker images. In Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’19).
[31]
Donna Spencer. 2009. Card Sorting: Designing Usable Categories. Rosenfeld Media.
[32]
Byungchul Tak, Hyekyung Kim, Sahil Suneja, Canturk Isci, and Prabhakar Kudva. 2018. Security analysis of container images using cloud analytics framework. In Proceedings of the International Conference on Web Services. 116–133.
[33]
Yiwen Wu, Yang Zhang, Tao Wang, and Huaimin Wang. 2020. Characterizing the occurrence of Dockerfile smells in open-source software: An empirical study. IEEE Access 8 (2020), 34127–34139.
[34]
Yiwen Wu, Yang Zhang, Tao Wang, and Huaimin Wang. 2020. An empirical study of build failures in the Docker context. In Proceedings of the 17th International Conference on Mining Software Repositories. 76–80.
[35]
Jiwei Xu, Yuewen Wu, Zhigang Lu, and Tao Wang. 2019. Dockerfile TF smell detection based on dynamic and static analysis methods. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC’19), Vol. 1. IEEE, Los Alamitos, CA, 185–190.
[36]
Ahmed Zerouali, Valerio Cosentino, Tom Mens, Gregorio Robles, and Jesus M. Gonzalez-Barahona. 2019. On the impact of outdated and vulnerable javascript packages in Docker images. In Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering (SANER’19). IEEE, Los Alamitos, CA, 619–623.
[37]
Ahmed Zerouali, Tom Mens, and Coen De Roover. 2021. On the usage of JavaScript, Python and Ruby packages in Docker Hub images. Science of Computer Programming 207 (2021), 102653.
[38]
Ahmed Zerouali, Tom Mens, Gregorio Robles, and Jesus M. Gonzalez-Barahona. 2019. On the relation between outdated Docker containers, severity vulnerabilities, and bugs. In Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering (SANER’19). IEEE, Los Alamitos, CA, 491–501.
[39]
Yang Zhang, Huaimin Wang, and Vladimir Filkov. 2019. A clustering-based approach for mining Dockerfile evolutionary trajectories. Science China Information Sciences 62, 1 (2019), 1–3.
[40]
Yang Zhang, Gang Yin, Tao Wang, Yue Yu, and Huaimin Wang. 2018. An insight into the impact of Dockerfile evolutionary trajectories on quality and latency. In Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC’18), Vol. 1. IEEE, Los Alamitos, CA, 138–143.
[41]
Nannan Zhao, Vasily Tarasov, Hadeel Albahar, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Arnab K. Paul, Keren Chen, and Ali R. Butt. 2020. Large-scale analysis of Docker images and performance implications for container storage systems. IEEE Transactions on Parallel and Distributed Systems 32, 4 (2020), 918–930.
[42]
Nannan Zhao, Vasily Tarasov, Hadeel Albahar, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Amit S. Warke, Mohamed Mohamed, and Ali R. Butt. 2019. Large-scale analysis of the Docker Hub dataset. In Proceedings of the 2019 IEEE International Conference on Cluster Computing (CLUSTER’19). IEEE, Los Alamitos, CA, 1–10.

Cited By

View all

Index Terms

  1. What Quality Aspects Influence the Adoption of Docker Images?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 6
    November 2023
    949 pages
    ISSN:1049-331X
    EISSN:1557-7392
    DOI:10.1145/3625557
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 September 2023
    Online AM: 31 May 2023
    Accepted: 07 April 2023
    Revised: 28 December 2022
    Received: 20 July 2022
    Published in TOSEM Volume 32, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Empirical software engineering
    2. software maintenance
    3. container virtualization
    4. Docker

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 608
      Total Downloads
    • Downloads (Last 12 months)292
    • Downloads (Last 6 weeks)38
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media