research-article

D-Tagger: A Tag Recommendation Approach for Docker Repositories

Authors:

Jun WeiAuthors Info & Claims

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware

Article No.: 3, Pages 1 - 10

https://doi.org/10.1145/3275219.3275220

Published: 16 September 2018 Publication History

Abstract

Docker repositories usually contain Docker images and Dockerfiles, where Docker images are a kind of off-the-shelf artifact and Dockerfiles specify how to automatically build Docker images following the notion of Infrastructure-as-Code. Given a huge number of Docker repositories, tag recommendation is essential to ensure that relevant ones can be easily retrieved, because tagging is practical in describing, bookmarking, navigating and searching software objects. However, in Docker Hub, tags are not well supported to semantically describing the repositories, and manually tagging is still an exhausting and time-consuming task.

Dockerfile specifies Docker repository in a rigorous and compact way. Thus, based on Dockerfile analysis, this paper proposes D-Tagger, a tag recommendation approach to addressing the problem of multi-labeling Docker repositories. When taking Dockerfile as specific description, D-Tagger models a repository with its labeled tags and the terms extracted from its Dockerfile, and employs Labeled Latent Dirichlet Allocation algorithm to make tag recommendation. When regarding Dockerfile as configuration code, D-Tagger constructs a feature model based on key instructions that identify the Dockerfile, and then recommends tags with a similarity-based ranking method. D-Tagger finally makes a combination by considering both of the two perspectives. We evaluate D-Tagger on over 100,000 repositories of Docker Hub (accessed until Aug. 15, 2017). The experimental results show that the accuracy of D-Tagger, in terms of Recall@5 and Recall@10, achieve 0.675 and 0.712 respectively. In addition, D-Tagger outperforms the state-of-the-art approach when tagging repositories without description documents.

References

[1]

Jafar M Al-Kofahi, Ahmed Tamrawi, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2010. Fuzzy set approach for automatic tagging in evolving software. In Software Maintenance (ICSM), 2010 IEEE International Conference on. IEEE, 1--10.

Digital Library

[2]

James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305.

Digital Library

[3]

Xuyang Cai, Jiangang Zhu, Beijun Shen, and Yuting Chen. 2016. Greta: Graph-based tag assignment for github repositories. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 63--72.

[4]

Luigi Catuogno and Clemente Galdi. 2016. On The Evaluation of Security Properties of Containerized Systems. In Ubiquitous Computing and Communications and 2016 International Symposium on Cyberspace and Security (IUCC-CSS), International Conference on. IEEE, 69--76.

[5]

Wei Chen, Peixing Xu, Wensheng Dou, Guoquan Wu, Chushu Gao, and Jun Wei. 2017. A Hierarchical Categorization Approach for Configuration Management Modules. In Computer Software and Applications Conference (COMPSAC), 2017 IEEE 41st Annual, Vol. 1. IEEE, 160--169.

[6]

Jürgen Cito, Gerald Schermann, John Erik Wittern, Philipp Leitner, Sali Zumberi, and Harald C Gall. 2017. An empirical analysis of the Docker container ecosystem on GitHub. In Proceedings of the 14th International Conference on Mining Software Repositories. IEEE Press, 323--333.

Digital Library

[7]

Kavita Ganesan. 2017. Topic Suggestions for Millions of Repositories. (July 2017). Retrieved November 10, 2017 from https://githubengineering.com/topics/

[8]

Eva Gibaja and Sebastián Ventura. 2015. A tutorial on multilabel learning. ACM Computing Surveys (CSUR) 47, 3 (2015), 52.

Digital Library

[9]

David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.

[10]

Waldemar Hummer, Florian Rosenberg, Fábio Oliveira, and Tamar Eilam. 2013. Testing idempotence for infrastructure as code. In ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing. Springer, 368--388.

[11]

Docker Inc. 2017. Docker Overview. (Nov. 2017). Retrieved November 2, 2017 from https://docs.docker.com/engine/docker-overview/

[12]

Docker Inc. 2017. Overview of Docker Hub. (Nov. 2017). Retrieved November 2, 2017 from https://docs.docker.com/docker-hub/

[13]

Portworx Inc. 2017. 2017 Annual Container Adoption Survey: Huge Growth in Containers. (April 2017). Retrieved April 12, 2017 from https://portworx.com/2017-container-adoption-survey/

[14]

Idan Kamara. 2014. bashlex. (Nov. 2014). Retrieved November 7, 2017 from https://github.com/idank/bashlex

[15]

Shinji Kawaguchi, Pankaj K Garg, Makoto Matsushita, and Katsuro Inoue. 2006. Mudablue: An automatic categorization system for open source repositories. Journal of Systems and Software 79, 7 (2006), 939--953.

Digital Library

[16]

Mario Linares-Vásquez, Collin McMillan, Denys Poshyvanyk, and Mark Grechanik. 2014. On using machine learning to automatically classify software applications into domain categories. Empirical Software Engineering 19, 3 (2014), 582--618.

Digital Library

[17]

AR Manu, Jitendra Kumar Patel, Shakil Akhtar, VK Agrawal, and KN Bala Subramanya Murthy. 2016. Docker container security via heuristics-based multilateral security-conceptual and pragmatic study. In Circuit, Power and Computing Technologies (ICCPCT), 2016 International Conference on. IEEE, 1--14.

[18]

Collin McMillan, Mario Linares-Vasquez, Denys Poshyvanyk, and Mark Grechanik. 2011. Categorizing software applications for maintenance. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on. IEEE, 343--352.

Digital Library

[19]

Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol. 1.

[20]

Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 248--256.

Digital Library

[21]

Eric Sven Ristad and Peter N Yianilos. 1998. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 5 (1998), 522--532.

Digital Library

[22]

Hinrich Schütze. 2008. Introduction to information retrieval. In Proceedings of the international communication of association for computing machinery conference.

[23]

Rui Shu, Xiaohui Gu, and William Enck. 2017. A Study of Security Vulnerabilities on Docker Hub. In Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. ACM, 269--280.

Digital Library

[24]

Kai Tian, Meghan Revelle, and Denys Poshyvanyk. 2009. Using latent dirichlet allocation for automatic categorization of software. In Mining Software Repositories, 2009. MSR'09. 6th IEEE International Working Conference on. IEEE, 163--166.

Digital Library

[25]

Secil Ugurel, Robert Krovetz, and C Lee Giles. 2002. What's the code?: automatic classification of source code archives. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 632--638.

Digital Library

[26]

Santiago Vargas-Baldrich, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Automated Tagging of Software Projects Using Bytecode and Dependencies (N). In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 289--294.

Digital Library

[27]

Shaowei Wang, David Lo, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Entagrec: An enhanced tag recommendation system for software information sites. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. IEEE, 291--300.

Digital Library

[28]

Tao Wang, Huaimin Wang, Gang Yin, Charles X Ling, Xiang Li, and Peng Zou. 2013. Mining software profile across multiple repositories for hierarchical categorization. In Software Maintenance (ICSM), 2013 29th IEEE International Conference on. IEEE, 240--249.

Digital Library

[29]

Xin Xia, David Lo, Xinyu Wang, and Bo Zhou. 2013. Tag recommendation in software information sites. In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on. IEEE, 287--296.

Digital Library

[30]

Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multilabel learning algorithms. IEEE transactions on knowledge and data engineering 26, 8 (2014), 1819--1837.

[31]

Pingyi Zhou, Jin Liu, Zijiang Yang, and Guangyou Zhou. 2017. Scalable tag recommendation for software information sites. In Software Analysis, Evolution and Reengineering (SANER), 2017 IEEE 24th International Conference on. IEEE, 272--282.

Cited By

Malhotra RBansal AKessentini M(2024)A Systematic Literature Review on Maintenance of Software ContainersACM Computing Surveys10.1145/364509256:8(1-38)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1145/3645092
Chen WZhou JZhu JWu GWei J(2019)Semi-Supervised Learning Based Tag Recommendation for Docker RepositoriesJournal of Computer Science and Technology10.1007/s11390-019-1954-434:5(957-971)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1007/s11390-019-1954-4

Index Terms

D-Tagger: A Tag Recommendation Approach for Docker Repositories
1. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories
    2. Software maintenance tools

Recommendations

DockerKG: A Knowledge Graph of Docker Artifacts
ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops

Docker helps developers reuse software artifacts by providing a lightweight solution to the problem of operating system virtualization. A Docker image contains very rich and useful knowledge of software engineering, including the source of software ...
Semi-Supervised Learning Based Tag Recommendation for Docker Repositories
Abstract
Docker has been the mainstream technology of providing reusable software artifacts recently. Developers can easily build and deploy their applications using Docker. Currently, a large number of reusable Docker images are publicly shared in ...
A Transformer-based Model for Assisting Dockerfile Revising
ICSE-Companion '24: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings

Dockerfile plays an important role in the containerized software development process since it specifies the structure and functionality of the built Docker image. Currently, Dockerfile writing and modification still rely on manual operations which can be ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware

September 2018

167 pages

ISBN:9781450365901

DOI:10.1145/3275219

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Institute of Software, Chinese Academy of Sciences
CCF: China Computer Federation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Internetware '18

Internetware '18: The Tenth Asia-Pacific Symposium on Internetware

September 16, 2018

Beijing, China

Acceptance Rates

Internetware '18 Paper Acceptance Rate 20 of 26 submissions, 77%;

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
195
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Malhotra RBansal AKessentini M(2024)A Systematic Literature Review on Maintenance of Software ContainersACM Computing Surveys10.1145/364509256:8(1-38)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1145/3645092
Chen WZhou JZhu JWu GWei J(2019)Semi-Supervised Learning Based Tag Recommendation for Docker RepositoriesJournal of Computer Science and Technology10.1007/s11390-019-1954-434:5(957-971)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1007/s11390-019-1954-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten