Skip to main content

MF-TagRec: Multi-feature Fused Tag Recommendation for GitHub

  • Conference paper
  • First Online:
PRICAI 2022: Trends in Artificial Intelligence (PRICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13631))

Included in the following conference series:

  • 1193 Accesses

Abstract

GitHub is one of the most popular hosting platforms for open-source projects, where tags are widely used to facilitate software organization and retrieval. However, the existences of inadequate and low-quality tags on GitHub hinder users from searching and retrieving their desired projects. In this paper, we propose MF-TagRec, an automatic tag recommendation method for projects by extracting multiple features from Readme documents, programming languages and dependency package tags of projects. We capture topics and global semantics of Readme documents as text features, along with programming languages and dependency package tags as word vector features. We construct a convolutional neural network and feed text and word vector features to predict the most relevant tags for untagged or few tag-assigned projects. We evaluate our proposed MF-TagRec on a real dataset GitHubDepDataSet compared with five baselines. The results show that MF-TagRec achieves Recall@5 and Recall@10 to 0.756 and 0.864 respectively, which outperforms the previous baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. GitHub.: The state of the octoverse the state of the octoverse celebrates a year of building across teams, time zones, and millions of merged pull requests (2021). https://octoverse.github.com

  2. Kavita, G.: Topic suggestions for millions of repositories the github blog (2017)

    Google Scholar 

  3. Izadi, M., Heydarnoori, A., Gousios, G.: Topic recommendation for software repositories using multi-label classification algorithms. Empir. Softw. Eng. 26(5), 1–33 (2021)

    Article  Google Scholar 

  4. Liu, Y., Li, W., Zhigang, H., Yanwen, W., Jun, L.: Automatic tagging for open source software by utilizing package dependency information. In: 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE), pp. 137–144. IEEE (2020)

    Google Scholar 

  5. Stefanie, B., Martin, P.: Synonym suggestion for tags on stack overflow. In: 2015 IEEE 23rd International Conference on Program Comprehension, pp. 94–103. IEEE (2015)

    Google Scholar 

  6. Xinhao, Z., Lin, L., Dong, Z.: An attentive deep supervision based semantic matching framework for tag recommendation in software information sites. In: 2020 27th Asia-Pacific Software Engineering Conference (APSEC), pp. 490–494. IEEE (2020)

    Google Scholar 

  7. David, M.B., Andrew, Y.Ng., Michael, I.J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  8. Tomas, M., Kai, C., Greg, C., Jeffrey, D.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013

  9. Tim o’Reilly.: What is web 2.0. O’Reilly Media, Inc (2009)

    Google Scholar 

  10. Fabiano, M.B., Jussara, M.A., Marcos, A.G.: A survey on tag recommendation methods. J. Assoc. Inform. Sci. Technol. 68(4), 830–844 (2017)

    Google Scholar 

  11. Tao, W., Gang, Y., Xiang, L., Huaimin, W.: Labeled topic detection of open source software from mining mass textual project profiles. In: Proceedings of the First International Workshop on Software Mining, pp. 17–24 (2012)

    Google Scholar 

  12. Wang, X.-Y., Xia, X., Lo, D.: Tagcombine: recommending tags to contents in software information sites. J. Comput. Sci. Technol. 30(5), 1017–1035 (2015)

    Article  Google Scholar 

  13. Wang, S., Lo, D., Vasilescu, B., Serebrenik, A.: EnTagRec: an enhanced tag recommendation system for software information sites. IEEE (2014)

    Google Scholar 

  14. Wang, S., Lo, D., Vasilescu, B., Serebrenik, A.: Entagrec++: an enhanced tag recommendation system for software information sites. Empir. Softw. Eng. 23(2), 800–832 (2018)

    Article  Google Scholar 

  15. Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 604–624 (2021)

    Article  MathSciNet  Google Scholar 

  16. Nal, K., Edward, G., Phil, B.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)

  17. Pengfei, L., Xipeng, Q., Xuanjing, H.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)

  18. Zichao, Y., Diyi, Y., Chris, D., Xiaodong, H., Alex, S., Eduard, H.: Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of The North American Chapter of The Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)

    Google Scholar 

  19. Siwei, L., Liheng, X., Kang, L., Jun, Z.: Recurrent convolutional neural networks for text classification. In Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  20. Zhou, P., Liu, J., Liu, X., Yang, Z., Grundy, J.: Is deep learning better than traditional approaches in tag recommendation for software information sites? Inf. Softw. Technol. 109, 1–13 (2019)

    Article  Google Scholar 

  21. Li, J.C., Yuan, C., Song, Y.: Multi-label image annotation based on convolutional neural network. Comput. Sci. 43(7), 41–45 (2016)

    Google Scholar 

  22. Pingyi, Z., Jin, L., Zijiang, Y., Guangyou, Z.: Scalable tag recommendation for software information sites. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 272–282. IEEE (2017)

    Google Scholar 

  23. Liu, J., Zhou, P., Yang, Z., Liu, X., Grundy, J.: FastTagRec: fast tag recommendation for software information sites. Autom. Softw. Eng. 25(4), 675–701 (2018)

    Article  Google Scholar 

  24. Jiang, J., Qiudi, W., Cao, J., Xia, X., Zhang, L.: Recommending tags for pull requests in github. Inf. Softw. Technol. 129, 106394 (2021)

    Article  Google Scholar 

  25. Juri, D.R., Davide, D.R., Claudio, D.S., Phuong, N., Riccardo, R.: TopFilter: an approach to recommend relevant github topics. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–11 (2020)

    Google Scholar 

  26. Junyoung, C., Caglar, G., KyungHyun, C., Yoshua, B.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  27. Jin, L., Yihe, Y., Shiqi, L., Jin, W., Hui, C.: Attention-based BiGRU-CNN for Chinese question classification. J. Ambient Intell. Human. Comput. 1–12 (2019)

    Google Scholar 

Download references

Acknowledgements

This work is being supported by the National Natural Science Foundation of China under the Grant No. 62172451, and supported by Scientific and Technological Innovation 2030-Major Project of New Generation Artificial Intelligence under the Grant No. 2020AAA0109601, by Open Research Projects of Zhejiang Lab under the Grant No.2022KG0AB01, and in part by the Natural Science Foundation of Hunan under the Grant No. 2020JJ4754 and 2020JJ5775.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingxuan Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, L., Yang, R., Chen, T., Fei, H., Tang, J. (2022). MF-TagRec: Multi-feature Fused Tag Recommendation for GitHub. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13631. Springer, Cham. https://doi.org/10.1007/978-3-031-20868-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20868-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20867-6

  • Online ISBN: 978-3-031-20868-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics