skip to main content
10.1145/3275219.3275220acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

D-Tagger: A Tag Recommendation Approach for Docker Repositories

Published: 16 September 2018 Publication History

Abstract

Docker repositories usually contain Docker images and Dockerfiles, where Docker images are a kind of off-the-shelf artifact and Dockerfiles specify how to automatically build Docker images following the notion of Infrastructure-as-Code. Given a huge number of Docker repositories, tag recommendation is essential to ensure that relevant ones can be easily retrieved, because tagging is practical in describing, bookmarking, navigating and searching software objects. However, in Docker Hub, tags are not well supported to semantically describing the repositories, and manually tagging is still an exhausting and time-consuming task.
Dockerfile specifies Docker repository in a rigorous and compact way. Thus, based on Dockerfile analysis, this paper proposes D-Tagger, a tag recommendation approach to addressing the problem of multi-labeling Docker repositories. When taking Dockerfile as specific description, D-Tagger models a repository with its labeled tags and the terms extracted from its Dockerfile, and employs Labeled Latent Dirichlet Allocation algorithm to make tag recommendation. When regarding Dockerfile as configuration code, D-Tagger constructs a feature model based on key instructions that identify the Dockerfile, and then recommends tags with a similarity-based ranking method. D-Tagger finally makes a combination by considering both of the two perspectives. We evaluate D-Tagger on over 100,000 repositories of Docker Hub (accessed until Aug. 15, 2017). The experimental results show that the accuracy of D-Tagger, in terms of Recall@5 and Recall@10, achieve 0.675 and 0.712 respectively. In addition, D-Tagger outperforms the state-of-the-art approach when tagging repositories without description documents.

References

[1]
Jafar M Al-Kofahi, Ahmed Tamrawi, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. 2010. Fuzzy set approach for automatic tagging in evolving software. In Software Maintenance (ICSM), 2010 IEEE International Conference on. IEEE, 1--10.
[2]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, Feb (2012), 281--305.
[3]
Xuyang Cai, Jiangang Zhu, Beijun Shen, and Yuting Chen. 2016. Greta: Graph-based tag assignment for github repositories. In Computer Software and Applications Conference (COMPSAC), 2016 IEEE 40th Annual, Vol. 1. IEEE, 63--72.
[4]
Luigi Catuogno and Clemente Galdi. 2016. On The Evaluation of Security Properties of Containerized Systems. In Ubiquitous Computing and Communications and 2016 International Symposium on Cyberspace and Security (IUCC-CSS), International Conference on. IEEE, 69--76.
[5]
Wei Chen, Peixing Xu, Wensheng Dou, Guoquan Wu, Chushu Gao, and Jun Wei. 2017. A Hierarchical Categorization Approach for Configuration Management Modules. In Computer Software and Applications Conference (COMPSAC), 2017 IEEE 41st Annual, Vol. 1. IEEE, 160--169.
[6]
Jürgen Cito, Gerald Schermann, John Erik Wittern, Philipp Leitner, Sali Zumberi, and Harald C Gall. 2017. An empirical analysis of the Docker container ecosystem on GitHub. In Proceedings of the 14th International Conference on Mining Software Repositories. IEEE Press, 323--333.
[7]
Kavita Ganesan. 2017. Topic Suggestions for Millions of Repositories. (July 2017). Retrieved November 10, 2017 from https://githubengineering.com/topics/
[8]
Eva Gibaja and Sebastián Ventura. 2015. A tutorial on multilabel learning. ACM Computing Surveys (CSUR) 47, 3 (2015), 52.
[9]
David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.
[10]
Waldemar Hummer, Florian Rosenberg, Fábio Oliveira, and Tamar Eilam. 2013. Testing idempotence for infrastructure as code. In ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing. Springer, 368--388.
[11]
Docker Inc. 2017. Docker Overview. (Nov. 2017). Retrieved November 2, 2017 from https://docs.docker.com/engine/docker-overview/
[12]
Docker Inc. 2017. Overview of Docker Hub. (Nov. 2017). Retrieved November 2, 2017 from https://docs.docker.com/docker-hub/
[13]
Portworx Inc. 2017. 2017 Annual Container Adoption Survey: Huge Growth in Containers. (April 2017). Retrieved April 12, 2017 from https://portworx.com/2017-container-adoption-survey/
[14]
Idan Kamara. 2014. bashlex. (Nov. 2014). Retrieved November 7, 2017 from https://github.com/idank/bashlex
[15]
Shinji Kawaguchi, Pankaj K Garg, Makoto Matsushita, and Katsuro Inoue. 2006. Mudablue: An automatic categorization system for open source repositories. Journal of Systems and Software 79, 7 (2006), 939--953.
[16]
Mario Linares-Vásquez, Collin McMillan, Denys Poshyvanyk, and Mark Grechanik. 2014. On using machine learning to automatically classify software applications into domain categories. Empirical Software Engineering 19, 3 (2014), 582--618.
[17]
AR Manu, Jitendra Kumar Patel, Shakil Akhtar, VK Agrawal, and KN Bala Subramanya Murthy. 2016. Docker container security via heuristics-based multilateral security-conceptual and pragmatic study. In Circuit, Power and Computing Technologies (ICCPCT), 2016 International Conference on. IEEE, 1--14.
[18]
Collin McMillan, Mario Linares-Vasquez, Denys Poshyvanyk, and Mark Grechanik. 2011. Categorizing software applications for maintenance. In Software Maintenance (ICSM), 2011 27th IEEE International Conference on. IEEE, 343--352.
[19]
Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol. 1.
[20]
Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 248--256.
[21]
Eric Sven Ristad and Peter N Yianilos. 1998. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 5 (1998), 522--532.
[22]
Hinrich Schütze. 2008. Introduction to information retrieval. In Proceedings of the international communication of association for computing machinery conference.
[23]
Rui Shu, Xiaohui Gu, and William Enck. 2017. A Study of Security Vulnerabilities on Docker Hub. In Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. ACM, 269--280.
[24]
Kai Tian, Meghan Revelle, and Denys Poshyvanyk. 2009. Using latent dirichlet allocation for automatic categorization of software. In Mining Software Repositories, 2009. MSR'09. 6th IEEE International Working Conference on. IEEE, 163--166.
[25]
Secil Ugurel, Robert Krovetz, and C Lee Giles. 2002. What's the code?: automatic classification of source code archives. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 632--638.
[26]
Santiago Vargas-Baldrich, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Automated Tagging of Software Projects Using Bytecode and Dependencies (N). In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 289--294.
[27]
Shaowei Wang, David Lo, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Entagrec: An enhanced tag recommendation system for software information sites. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on. IEEE, 291--300.
[28]
Tao Wang, Huaimin Wang, Gang Yin, Charles X Ling, Xiang Li, and Peng Zou. 2013. Mining software profile across multiple repositories for hierarchical categorization. In Software Maintenance (ICSM), 2013 29th IEEE International Conference on. IEEE, 240--249.
[29]
Xin Xia, David Lo, Xinyu Wang, and Bo Zhou. 2013. Tag recommendation in software information sites. In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on. IEEE, 287--296.
[30]
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A review on multilabel learning algorithms. IEEE transactions on knowledge and data engineering 26, 8 (2014), 1819--1837.
[31]
Pingyi Zhou, Jin Liu, Zijiang Yang, and Guangyou Zhou. 2017. Scalable tag recommendation for software information sites. In Software Analysis, Evolution and Reengineering (SANER), 2017 IEEE 24th International Conference on. IEEE, 272--282.

Cited By

View all
  • (2024)A Systematic Literature Review on Maintenance of Software ContainersACM Computing Surveys10.1145/364509256:8(1-38)Online publication date: 10-Apr-2024
  • (2019)Semi-Supervised Learning Based Tag Recommendation for Docker RepositoriesJournal of Computer Science and Technology10.1007/s11390-019-1954-434:5(957-971)Online publication date: 1-Sep-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware
September 2018
167 pages
ISBN:9781450365901
DOI:10.1145/3275219
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Institute of Software, Chinese Academy of Sciences
  • CCF: China Computer Federation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Docker
  2. Docker repository
  3. Dockerfile
  4. tag recommendation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

Internetware '18

Acceptance Rates

Internetware '18 Paper Acceptance Rate 20 of 26 submissions, 77%;
Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Systematic Literature Review on Maintenance of Software ContainersACM Computing Surveys10.1145/364509256:8(1-38)Online publication date: 10-Apr-2024
  • (2019)Semi-Supervised Learning Based Tag Recommendation for Docker RepositoriesJournal of Computer Science and Technology10.1007/s11390-019-1954-434:5(957-971)Online publication date: 1-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media