Semi-Supervised Learning Based Tag Recommendation for Docker Repositories

Chen, Wei; Zhou, Jia-Hong; Zhu, Jia-Xin; Wu, Guo-Quan; Wei, Jun

doi:10.1007/s11390-019-1954-4

Semi-Supervised Learning Based Tag Recommendation for Docker Repositories

Regular Paper
Published: 06 September 2019

Volume 34, pages 957–971, (2019)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Wei Chen^1,2,
Jia-Hong Zhou^1,2,
Jia-Xin Zhu^1,2,
Guo-Quan Wu^1,2,3 &
…
Jun Wei^1,2,3

216 Accesses
6 Citations
Explore all metrics

Abstract

Docker has been the mainstream technology of providing reusable software artifacts recently. Developers can easily build and deploy their applications using Docker. Currently, a large number of reusable Docker images are publicly shared in online communities, and semantic tags can be created to help developers effectively reuse the images. However, the communities do not provide tagging services, and manually tagging is exhausting and time-consuming. This paper addresses the problem through a semi-supervised learning-based approach, named SemiTagRec. SemiTagRec contains four components: (1) the predictor, which calculates the probability of assigning a specific tag to a given Docker repository; (2) the extender, which introduces new tags as the candidates based on tag correlation analysis; (3) the evaluator, which measures the candidate tags based on a logistic regression model; (4) the integrator, which calculates a final score by combining the results of the predictor and the evaluator, and then assigns the tags with high scores to the given Docker repositories. SemiTagRec includes the newly tagged repositories into the training data for the next round of training. In this way, SemiTagRec iteratively trains the predictor with the cumulative tagged repositories and the extended tag vocabulary, to achieve a high accuracy of tag recommendation. Finally, the experimental results show that SemiTagRec outperforms the other approaches and SemiTagRec’s accuracy, in terms of Recall@5 and Recall@10, is 0.688 and 0.781 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in recommender systems

Article Open access 01 November 2020

Recommendation system based on deep learning methods: a systematic review and new directions

Article 03 August 2019

A systematic review: machine learning based recommendation systems for e-learning

Article 14 December 2019

References

Merkel D. Docker: Lightweight Linux containers for consistent development and deployment. Linux Journal, 2014, 2014(239): Article No. 2.
Seo K T, Hwang H S, Moon I Y, Kwon O Y, Kim B J. Performance comparison analysis of linux container and virtual machine for building cloud. Advanced Science and Technology Letters, 2014, 66(2): 105-111.
Article Google Scholar
Hummer W, Rosenberg F, Oliveira F, Eilam T. Testing idempotence for infrastructure as code. In Proc. the 14th ACM/IFIP/USENIX International Middleware Conference, December 2013, pp.368-388.
Chapter Google Scholar
Xu T Y, Marinov D. Mining container image repositories for software configuration and beyond. In Proc. the 40th International Conference on Software Engineering: New Ideas and Emerging Results, May 2018, pp.49-52.
Xia X, Lo D, Wang X Y, Zhou B. Tag recommendation in software information sites. In Proc. the 10th IEEE Working Conference on Mining Software Repositories, May 2013, pp.287-296.
Chen W, Xu P X, Dou WS, Wu G Q, Gao C S, Wei J. A hierarchical categorization approach for configuration management modules. In Proc. the 41st IEEE Annual Computer Software and Applications Conference, July 2017, pp.160-169.
Wang S, Lo D, Vasilescu B, Serebrenik A. EnTagRec: An enhanced tag recommendation system for software information sites. In Proc. the 30th IEEE International Conference on Software Maintenance and Evolution, September 2014, pp.291-300.
Hosmer D, Lemeshow J, Sturdivant R. Applied Logistic Regression (3rd edition). John Wiley & Sons, 2013.
Yin K, Zhou J H, Chen W, Wu G Q, Zhu J X, Wei J. DTagger: A tag recommendation approach for Docker repositories. In Proc. the 10th Asia-Pacific Symposium on Internetware, September 2018, Article No. 3.
Zhou P, Liu J, Yang Z J, Zhou G. Scalable tag recommendation for software information sites. In Proc. the 24th International Conference on Software Analysis, Evolution and Reengineering, February 2017, pp.272-282.
Ramage D, Hall D, Nallapati R, Manning C. Labeled LDA: A supervised topic model for credit attribution in multilabeled corpora. In Proc. the 2009 Conference on Empirical Methods in Natural Language, August 2009, pp.248-256.
David M, Andrew Y, Michael I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
MATH Google Scholar
Zhang M, Zhou Z. A review on multi-label learning algorithms. IEEE Trans. Knowledge and Data Engineering, 2014, 26(8): 1819-1837.
Article Google Scholar
Gousios G, Pinzger M, van Deursen A. An exploratory study of the pull-based software development model. In Proc. the 36th International Conference on Software Engineering, May 2014, pp.345-355.
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 2012, 13: 281-305.
MathSciNet MATH Google Scholar
McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. In Proc. the 1998 AAAI/ICML Workshop on Learning for Text Categorization, July 1998, pp.41-48.
Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(5): 804-813.
Article Google Scholar
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5-32.
Article Google Scholar
Shu R, Gu X, Enck W. A study of security vulnerabilities on Docker hub. In Proc. the 7th ACM Conference on Data and Application Security and Privacy, March 2017, pp.269-280.
Manu A, Patel J, Akhtar S, Agrawal V, Murthy K. Docker container security via heuristics-based multilateral securityconceptual and pragmatic study. In Proc. the 2016 International Conference on Circuit, Power and Computing Technologies, March 2016, Article No. 114.
Catuogno L, Galdi C. On the evaluation of security properties of containerized systems. In Proc. the 15th International Conference on Ubiquitous Computing and Communications and the 2016 International Symposium on Cyberspace and Security, December 2016, pp.69-76.
Zerouali A, Mens T, Robles G, González-Barahona J M. On the relation between outdated Docker containers, severity vulnerabilities and bugs. In Proc. the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, February 2019, pp.491-501.
Hassan F, Rodriguez R, Wang X. RUDSEA: Recommending updates of Dockerfiles via software environment analysis. In Proc. the 33rd ACM/IEEE International Conference on Automated Software Engineering, September 2018, pp.796-801.
Zhang Y, Yin G, Wang T et al. An insight into the impact of Dockerfile evolutionary trajectories on quality and latency. In Proc. the 42nd IEEE Annual Computer Software and Applications Conference, July 2018, pp.138-143.
Cito J, Schermann G, Wittern J, Leitner P, Zumberi S, Gall H. An empirical analysis of the docker container ecosystem on Github. In Proc. the 14th International Conference on Mining Software Repositories, May 2017, pp.323-333.
Schermann G, Zumberi S, Cito J. Structured information on state and evolution of Dockerfiles on Github. In Proc. the 15th International Conference on Mining Software Repositories, May 2018, pp.26-29.
Cai X, Zhu J, Shen B et al. GRETA: Graph-based tag assignment for Github repositories. In Proc. the 40th IEEE Annual Computer Software and Applications Conference, June 2016, pp.63-72.
Ganesan K. Topic suggestions for millions of repositories. https://github.blog/2017-07-31-topics/, July 2019.
Al-Kofahi J M, Tamrawi A, Nguyen T T, Nguyen H A, Nguyen T N. Fuzzy set approach for automatic tagging in evolving software. In Proc. the 26th IEEE International Conference on Software Maintenance, September 2010, Article No. 37.
Gibaja E, Ventura S. A tutorial on multilabel learning. ACM Computing Surveys, 2015, 47(3): Article No. 52.
Article Google Scholar
Vargas-Baldrich S, V’asquez M L, Poshyvanyk D. Automated tagging of software projects using bytecode and dependencies (N). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, November 2015, pp.289-294.
Liu J, Zhou P, Yang Z, Liu X, Grundy J. FastTagRec: Fast tag recommendation for software information sites. Automated Software Engineering, 2018, 25(4): 675-701.
Article Google Scholar
Belém F, Almeida J, Gonçalves M. A survey on tag recommendation methods. Journal of the Association for Information Science and Technology, 2017, 68(4): 830-844.
Article Google Scholar
Belém F, Heringer A G, Almeida J, Gonçalves M. Exploiting syntactic and neighbourhood attributes to address cold start in tag recommendation. Information Processing and Management, 2019, 56(3): 771-790.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Wei Chen, Jia-Hong Zhou, Jia-Xin Zhu, Guo-Quan Wu & Jun Wei
University of Chinese Academy of Sciences, Beijing, 100049, China
Wei Chen, Jia-Hong Zhou, Jia-Xin Zhu, Guo-Quan Wu & Jun Wei
State Key Laboratory of Computer Sciences, Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Guo-Quan Wu & Jun Wei

Authors

Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Hong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Xin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Quan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Chen.

Electronic supplementary material

ESM 1

(PDF 80 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, W., Zhou, JH., Zhu, JX. et al. Semi-Supervised Learning Based Tag Recommendation for Docker Repositories. J. Comput. Sci. Technol. 34, 957–971 (2019). https://doi.org/10.1007/s11390-019-1954-4

Download citation

Received: 28 February 2019
Revised: 12 July 2019
Published: 06 September 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s11390-019-1954-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-Supervised Learning Based Tag Recommendation for Docker Repositories

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

Recommendation system based on deep learning methods: a systematic review and new directions

A systematic review: machine learning based recommendation systems for e-learning

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-Supervised Learning Based Tag Recommendation for Docker Repositories

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

Recommendation system based on deep learning methods: a systematic review and new directions

A systematic review: machine learning based recommendation systems for e-learning

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation