Skip to main content
Log in

Tag recommendation for open source software

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Nowadays open source software becomes highly popular and is of great importance for most software engineering activities. To facilitate software organization and retrieval, tagging is extensively used in open source communities. However, finding the desired software through tags in these communities such as Freecode and ohloh is still challenging because of tag insufficiency. In this paper, we propose TRG (tag recommendation based on semantic graph), a novel approach to discovering and enriching tags of open source software. Firstly, we propose a semantic graph to model the semantic correlations between tags and the words in software descriptions. Then based on the graph, we design an effective algorithm to recommend tags for software. With comprehensive experiments on large-scale open source software datasets by comparing with several typical related works, we demonstrate the effectiveness and efficiency of our method in recommending proper tags.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Wang T, Yin G, Li X, Wang H. Labeled topic detection of open source software from mining mass textual project profiles. In: Proceedings of the ACM SIGKDD Workshop on Software Mining. 2012, 17–24

    Chapter  Google Scholar 

  2. Tang J, Leung H, Luo Q, Chen D, Gong J. Towards ontology learning from folksonomies. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009, 9: 2089–2094

    Google Scholar 

  3. Liu K, Fang B, Zhang W. Ontology emergence from folksonomies. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1109–1118

    Google Scholar 

  4. Wang W, Barnaghi P M, Bargiela A. Probabilistic topic models for learning terminological ontologies. IEEE Transactions on knowledge and Data engineering, 2010, 22(7): 1028–1040

    Article  Google Scholar 

  5. Griffiths T, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5228–5235

    Article  Google Scholar 

  6. Yin Z, Cao L, Han J, Zhai C, Huang T. Geographical topic discovery and comparison. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 247–256

    Chapter  Google Scholar 

  7. Sigurbjörnsson B, Van Zwol R. Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 327–336

    Chapter  Google Scholar 

  8. Song Y, Zhang L, Giles C. Automatic tag recommendation algorithms for social recommender systems. ACM Transactions on theWeb, 2011, 5(1): 4:1–4:31

    Google Scholar 

  9. Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74

    Chapter  Google Scholar 

  10. Alexopoulos P, Pavlopoulos J, Wallace M, Kafentzis K. Exploiting ontological relations for automatic semantic tag recommendation. In: Proceedings of the 7th International Conference on Semantic Systems. 2011, 105–110

    Google Scholar 

  11. Djuana E, Xu Y, Li Y, Cox C. Personalization in tag ontology learning for recommendation making. In: Proceedings of the 14th International Conference on Information Integration and Web-based Applications and Services. 2012, 368–377

    Google Scholar 

  12. Kawaguchi S, Garg P, Matsushita M, Inoue K. Mudablue: an automatic categorization system for open source repositories. Journal of Systems and Software, 2006, 79(7): 939–953

    Article  Google Scholar 

  13. Kuhn A. Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 175–178

    Google Scholar 

  14. McMillan C, Linares-Vásquez M, Poshyvanyk D, Grechanik M. Categorizing software applications for maintenance. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. 2011, 343–352

    Google Scholar 

  15. Tian K, Revelle M, Poshyvanyk D. Using latent dirichlet allocation for automatic categorization of software. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 163–166

    Google Scholar 

  16. Blei D, Ng A, Jordan M. Latent dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022

    MATH  Google Scholar 

  17. Wang Y, Agichtein E, Benzi M. Tm-lda: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 123–131

    Chapter  Google Scholar 

  18. Cleary B, Exton C, Buckley J, English M. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empirical Software Engineering, 2009, 14(1): 93–130

    Article  Google Scholar 

  19. Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P. Mining concepts from code with probabilistic topic models. In: Proceedings of the 22nd IEEE/ACMInternational Conference on Automated Software Engineering. 2007, 461–464

    Google Scholar 

  20. Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C. A search engine for finding highly relevant applications. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 475–484

    Chapter  Google Scholar 

  21. Zhou J, Zhang H, Lo D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 14–24

    Google Scholar 

  22. Si X, Sun M. Tag-LDA for scalable real-time tag recommendation. Journal of Computational Information Systems, 2009, 6(1): 23–31

    Google Scholar 

  23. Krestel R, Fankhauser P, Nejdl W. Latent dirichlet allocation for tag recommendation. In: Proceedings of the 3rd ACM Conference on Recommender Systems. 2009, 61–68

    Google Scholar 

  24. Jäschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G. Tag recommendations in social bookmarking systems. AI Communications, 2008, 21(4): 231–247

    MATH  MathSciNet  Google Scholar 

  25. Song Y, Zhuang Z, Li H, Zhao Q, Li J, Lee W C, Giles C L. Real-time automatic tag recommendation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 515–522

    Google Scholar 

  26. Adrian B, Sauermann L, Roth-berghofer T. Contag: A semantic tag recommendation system. In: Proceedings of IMEDIA 2007 and ISEMANTICS 2007. 2007, 297–304

    Google Scholar 

  27. Prokofyev R, Boyarsky A, Ruchayskiy O, Aberer K, Demartini G, Cudré-Mauroux P. Tag recommendation for large-scale ontology-based information systems. In: Proceedings of the 11th International Conference on the Semantic Web. 2012, 325–336

    Google Scholar 

  28. Wartena C, Brussee R, Wibbels M. Using tag co-occurrence for recommendation. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications. 2009, 273–278

    Google Scholar 

  29. Krestel R, Fankhauser P. Tag recommendation using probabilistic topic models. In: Proceedings of the 2009 Discovery Challenge. 2009, 131–141

    Google Scholar 

  30. Asuncion H U, Asuncion A U, Taylor R N. Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 95–104

    Chapter  Google Scholar 

  31. Ramage D, Rosen E, Chuang J, Manning C D, McFarland D A. Topic modeling for the social sciences. In: Proceedings of NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond. 2009, 1–4

    Google Scholar 

  32. Somasundaram K, Murphy G C. Automatic categorization of bug reports using latent dirichlet allocation. In: Proceedings of the 5th India Software Engineering Conference. 2012, 125–130

    Google Scholar 

  33. Ramage D, Hall D, Nallapati R, Manning C. Labeled lDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 248–256

    Google Scholar 

  34. McCallum A. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002

    Google Scholar 

  35. Porter M F. An algorithm for suffix stripping. Program: electronic library and information systems, 1980, 14(3): 130–137

    Article  Google Scholar 

  36. Lewis D D, Yang Y, Rose T G, Li F. Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research, 2004, 5: 361–397

    Google Scholar 

  37. FariÃ’sa A, Brisaboa N R, Navarro G, Claude F, Places n S, RodrÃguez E. Word-based self-indexes for natural language text. ACM Transactions on Information Systems, 2012, 30(1): 1:1–1:34

    Google Scholar 

  38. Batagelj V, ZaverAnik M. Generalized cores. Arxiv preprint cs/0202039, 2002

    Google Scholar 

  39. Gemmell J, Ramezani M, Schimoler T, Christiansen L, Mobasher B. A fast effective multi-channeled tag recommender. In: Proceedings of the 2009 Discovery Challenge Workshop. 2009, 497: 59–63

    Google Scholar 

  40. Gemmell J, Schimoler T, Ramezani M, Mobasher B. Adapting knearest neighbor for tag recommendation in folksonomies. In: Proceedings of the 7th Workshop on Intelligent Techniques for Web Personalization and Recommender Systems. 2009, 628: Paper 8

  41. Garg N, Weber I. Personalized, interactive tag recommendation for flickr. In: Proceedings of the 2008 ACM Conference on Recommender Systems. 2008, 67–74

    Chapter  Google Scholar 

  42. Illig J, Hotho A, JÃd’schke R, Stumme G. A comparison of content-based tag recommendations in folksonomy systems. Lecture Notes in Computer Science, 2011, 6581: 136–149

    Article  Google Scholar 

  43. Thung F, Lo D, Jiang L. Detecting similar applications with collaborative tagging. In: Proceedings of the 28th IEEE International Confer ence on Software Maintenance. 2012, 600–603

    Google Scholar 

  44. Mockus A. Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. 2009, 11–20

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Wang.

Additional information

Tao Wang received both his BS and MS in Computer Science from National University of Defense Technology (NUDT) in 2007 and 2010. He is now a PhD candidate in Computer Science, NUDT. His work interests include open source software engineering, machine learning, data mining, and knowledge discovering in open source software.

Huaimin Wang received his Ph D in Computer Science from National University of Defense Technology (NUDT) in 1992. He is now a professor and chief engineer in department of educational affairs, NUDT. He has been awarded the “Chang Jiang Scholars Program” professor and the Distinct Young Scholar, etc. He has published more than 100 research papers in peer-reviewed international conferences and journals. His current research interests include middleware, software agent, and trustworthy computing.

Gang Yin received his Ph D degree in Computer Science from National University of Defense Technology (NUDT) in 2006. He is now an associate professor in NUDT. He has worked in several grand research projects including national 973, 863 projects and so on. He has published more than 60 research papers in international conferences and journals. His current research interests include distributed computing, information security, software engineering, and machine learning.

Charles X. Ling earned both of his MS and PhD from Computer and Information Science at University of Pennsylvania, and now a faculty member in Computer Science at Western University. He was/is an Associate Editor of IEEE TKDE, ACM TIST as well as the Panel Co-chair of ACM SIGKDD’12 and so on. He has published over 120 research papers in peer-reviewed conferences and journals such as IJCAI, TKDE. He is a Senior Member of IEEE and Lifetime Member of AAAI.

Xiao Li received the BS and MS degrees in Computer Science at the National University of Defense Technology in 2006 and 2008. He is currently a PhD Candidate in the Department of Computer Science at The University of Western Ontario. His research interests include data mining, machine learning, and related real-world applications.

Peng Zou is a professor, PhD supervisor in National University of Defense Technology, and now works in Academy of Equipment. He has worked as the director of several grand research projects and published many research papers in peer-reviewed international conferences and journals. His research interests include network, information security, distributed computing, and software engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Wang, H., Yin, G. et al. Tag recommendation for open source software. Front. Comput. Sci. 8, 69–82 (2014). https://doi.org/10.1007/s11704-013-2394-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-013-2394-x

Keywords

Navigation