Abstract
Docker has been the mainstream technology for providing reusable software artifacts by packaging applications, dependencies, and execution environments into images. Developers can easily build and deploy their applications using Docker. Currently, a large number of reusable Docker repositories are in the online open source communities, especially Docker Hub and Docker Store. Effectively reusing these artifacts requires a well understanding of them, and semantic tags provide this way. However, the communities do not support tags well, and little training data is available. This paper addresses the problem and proposes a semi-supervised learning based tag recommendation approach, SemiTagRec, for Docker repositories. SemiTagRec contains four components. (1) Predictor calculates the probabilities of assigning tags to Docker repositories. (2) Extender introduces in new tags as the candidates based on tag correlation analysis. (3) Evaluator measures the candidate tags. (4) Integrator combines the results of predictor and evaluator, and then takes the tags with high scores as the final result. SemiTagRec uses the newly tagged repositories together with the original ones as training data for the next round of training. In this iterative manner, SemiTagRec trains the predictor with the cumulative labeled data set and the extended tag vocabulary to achieve high accuracy of tag recommendation. Finally, we conducted some experiments and evaluated SemiTagRec by comparing it with other related works. Experimental results show that SemiTagRec outperforms the other approaches in terms of Recall@5 and Recall@10.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Xu, T., Marinov, D.: Mining container image repositories for software configuration and beyond. In: Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, pp. 1–13. ACM (2018)
Chen, W., Xu, P., Dou, W., Wu, G., Gao, C., Wei, J.: A hierarchical categorization approach for configuration management modules. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 160–169. IEEE (2017)
Xia, X., Lo, D., Wang, X., Zhou, B.: Tag recommendation in software information sites. In: Proceedings of 10th IEEE Working Conference on Mining Software Repositories (MSR), pp. 287–296. IEEE (2013)
Wang, S., Lo, D., Vasilescu, B., Serebrenik, A.: Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 291–300. IEEE (2014)
Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language, vol. 1, pp. 248–256 (2009)
Yin, K., Chen, W., Zhou, J., Wu, G., Wei, J.: STAR: a specialized tagging approach for docker repositories. In: 25th Asia-Pacific Software Engineering Conference (2018)
Hummer, W., Rosenberg, F., Oliveira, F., Eilam, T.: Testing idempotence for infrastructure as code. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS, vol. 8275, pp. 368–388. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45065-5_19
Zhou, P., Liu, J., Yang, Z., Zhou, G.: Scalable tag recommendation for software information sites. In: Proceedings of 24th-International-Conference-on Software-Analysis, -Evolution-and-Reengineering-(SANER), pp. 272–282. IEEE (2017)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Hosmer, D., Lemeshow, J., Sturdivant, R.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)
Hassan, F., Rodriguez, R., Wang, X.: RUDSEA: recommending updates of Dockerfiles via software environment analysis. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 796–801. ACM (2018)
Ahmed, Z., Tom, M., et al.: On the Relation Between Outdated Docker Containers, Severity Vulnerabilities and Bugs. arXiv preprint arXiv:1811.12874 (2018)
Zhang, Y., Yin, G., Wang, T., et al.: An insight into the impact of dockerfile evolutionary trajectories on quality and latency. In: Proceedings of 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), pp. 138–143. IEEE (2018)
Cito, J., Schermann, G., Wittern, J., Leitner, P., Zumberi, S., Gall, H.: An empirical analysis of the docker container ecosystem on github. In: Proceedings of the 14th International Conference on Mining Software Repositories, pp. 323–333. IEEE (2017)
Schermann, G., Zumberi, S., Cito, J.: Structured information on state and evolution of dockerfiles on github. In: Proceedings of the 15th International Conference on Mining Software Repositories (2018)
Liu, J., Zhou, P., Yang, Z., Liu, X., Grundy, J.: FastTagRec: fast tag recommendation for software information sites. Autom. Softw. Eng. 25(4), 675–701 (2018)
Zhou, P., Liu, J., Yang, Z., Zhou, G.: Scalable tag recommendation for software information sites. In: Proceedings of IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 272–282. IEEE (2017)
Cai, X., Zhu, J., et al.: Greta: graph-based tag assignment for github repositories. In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 63–72. IEEE (2016)
Vargas-Baldrich, S., Linares-V’asquez, M., Poshyvanyk, D.: Automated tagging of software projects using bytecode and dependencies. In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 289–294. IEEE (2015)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Acknowledgement
The authors would like to thank the contributions of the participants in our work and the comments of the reviewers. This work was partially supported by the National Key R&D Program of China under Grant No. 2016YFB1000803 and the National Natural Science Foundation of China under Grant No. 61732019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, J., Chen, W., Wu, G., Wei, J. (2019). SemiTagRec: A Semi-supervised Learning Based Tag Recommendation Approach for Docker Repositories. In: Peng, X., Ampatzoglou, A., Bhowmik, T. (eds) Reuse in the Big Data Era. ICSR 2019. Lecture Notes in Computer Science(), vol 11602. Springer, Cham. https://doi.org/10.1007/978-3-030-22888-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-22888-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22887-3
Online ISBN: 978-3-030-22888-0
eBook Packages: Computer ScienceComputer Science (R0)