Skip to main content

SemiTagRec: A Semi-supervised Learning Based Tag Recommendation Approach for Docker Repositories

  • Conference paper
  • First Online:
Book cover Reuse in the Big Data Era (ICSR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11602))

Included in the following conference series:

Abstract

Docker has been the mainstream technology for providing reusable software artifacts by packaging applications, dependencies, and execution environments into images. Developers can easily build and deploy their applications using Docker. Currently, a large number of reusable Docker repositories are in the online open source communities, especially Docker Hub and Docker Store. Effectively reusing these artifacts requires a well understanding of them, and semantic tags provide this way. However, the communities do not support tags well, and little training data is available. This paper addresses the problem and proposes a semi-supervised learning based tag recommendation approach, SemiTagRec, for Docker repositories. SemiTagRec contains four components. (1) Predictor calculates the probabilities of assigning tags to Docker repositories. (2) Extender introduces in new tags as the candidates based on tag correlation analysis. (3) Evaluator measures the candidate tags. (4) Integrator combines the results of predictor and evaluator, and then takes the tags with high scores as the final result. SemiTagRec uses the newly tagged repositories together with the original ones as training data for the next round of training. In this iterative manner, SemiTagRec trains the predictor with the cumulative labeled data set and the extended tag vocabulary to achieve high accuracy of tag recommendation. Finally, we conducted some experiments and evaluated SemiTagRec by comparing it with other related works. Experimental results show that SemiTagRec outperforms the other approaches in terms of Recall@5 and Recall@10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://hub.docker.com/r/nytimes/nginx-vod-module.

  2. 2.

    https://hub.docker.com/r/vhtec/jupyter-docker.

  3. 3.

    https://hub.docker.com/r/mtinx/tensorflow.

  4. 4.

    https://hub.docker.com/r/mitsutaka/mediaproxy-relay.

References

  1. Xu, T., Marinov, D.: Mining container image repositories for software configuration and beyond. In: Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, pp. 1–13. ACM (2018)

    Google Scholar 

  2. Chen, W., Xu, P., Dou, W., Wu, G., Gao, C., Wei, J.: A hierarchical categorization approach for configuration management modules. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 160–169. IEEE (2017)

    Google Scholar 

  3. Xia, X., Lo, D., Wang, X., Zhou, B.: Tag recommendation in software information sites. In: Proceedings of 10th IEEE Working Conference on Mining Software Repositories (MSR), pp. 287–296. IEEE (2013)

    Google Scholar 

  4. Wang, S., Lo, D., Vasilescu, B., Serebrenik, A.: Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of 2014 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 291–300. IEEE (2014)

    Google Scholar 

  5. Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language, vol. 1, pp. 248–256 (2009)

    Google Scholar 

  6. Yin, K., Chen, W., Zhou, J., Wu, G., Wei, J.: STAR: a specialized tagging approach for docker repositories. In: 25th Asia-Pacific Software Engineering Conference (2018)

    Google Scholar 

  7. Hummer, W., Rosenberg, F., Oliveira, F., Eilam, T.: Testing idempotence for infrastructure as code. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS, vol. 8275, pp. 368–388. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45065-5_19

    Chapter  Google Scholar 

  8. Zhou, P., Liu, J., Yang, Z., Zhou, G.: Scalable tag recommendation for software information sites. In: Proceedings of 24th-International-Conference-on Software-Analysis, -Evolution-and-Reengineering-(SANER), pp. 272–282. IEEE (2017)

    Google Scholar 

  9. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)

    Article  Google Scholar 

  10. Hosmer, D., Lemeshow, J., Sturdivant, R.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)

    Google Scholar 

  11. Hassan, F., Rodriguez, R., Wang, X.: RUDSEA: recommending updates of Dockerfiles via software environment analysis. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 796–801. ACM (2018)

    Google Scholar 

  12. Ahmed, Z., Tom, M., et al.: On the Relation Between Outdated Docker Containers, Severity Vulnerabilities and Bugs. arXiv preprint arXiv:1811.12874 (2018)

  13. Zhang, Y., Yin, G., Wang, T., et al.: An insight into the impact of dockerfile evolutionary trajectories on quality and latency. In: Proceedings of 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), pp. 138–143. IEEE (2018)

    Google Scholar 

  14. Cito, J., Schermann, G., Wittern, J., Leitner, P., Zumberi, S., Gall, H.: An empirical analysis of the docker container ecosystem on github. In: Proceedings of the 14th International Conference on Mining Software Repositories, pp. 323–333. IEEE (2017)

    Google Scholar 

  15. Schermann, G., Zumberi, S., Cito, J.: Structured information on state and evolution of dockerfiles on github. In: Proceedings of the 15th International Conference on Mining Software Repositories (2018)

    Google Scholar 

  16. Liu, J., Zhou, P., Yang, Z., Liu, X., Grundy, J.: FastTagRec: fast tag recommendation for software information sites. Autom. Softw. Eng. 25(4), 675–701 (2018)

    Article  Google Scholar 

  17. Zhou, P., Liu, J., Yang, Z., Zhou, G.: Scalable tag recommendation for software information sites. In: Proceedings of IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 272–282. IEEE (2017)

    Google Scholar 

  18. Cai, X., Zhu, J., et al.: Greta: graph-based tag assignment for github repositories. In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), vol. 1, pp. 63–72. IEEE (2016)

    Google Scholar 

  19. Vargas-Baldrich, S., Linares-V’asquez, M., Poshyvanyk, D.: Automated tagging of software projects using bytecode and dependencies. In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 289–294. IEEE (2015)

    Google Scholar 

  20. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgement

The authors would like to thank the contributions of the participants in our work and the comments of the reviewers. This work was partially supported by the National Key R&D Program of China under Grant No. 2016YFB1000803 and the National Natural Science Foundation of China under Grant No. 61732019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, J., Chen, W., Wu, G., Wei, J. (2019). SemiTagRec: A Semi-supervised Learning Based Tag Recommendation Approach for Docker Repositories. In: Peng, X., Ampatzoglou, A., Bhowmik, T. (eds) Reuse in the Big Data Era. ICSR 2019. Lecture Notes in Computer Science(), vol 11602. Springer, Cham. https://doi.org/10.1007/978-3-030-22888-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22888-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22887-3

  • Online ISBN: 978-3-030-22888-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics