Skip to main content

A Hybrid Approach for Tag Hierarchy Construction

  • Conference paper
  • First Online:
New Opportunities for Software Reuse (ICSR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10826))

Included in the following conference series:

Abstract

Open source resources are playing a more and more important role in software engineering for reuse. However, the dramatically increasing scale of these resources brings great challenges for their management and location. In this study, we propose a hybrid approach for automatic tag hierarchy construction, which combines the tag co-occurrence relations and domain knowledge to build and optimize the hierarchy. We firstly calculate the generality of each tag in accordance with the co-occurrence relationship with others, and construct the hierarchy based on the generality. Then we leverage the domain knowledge of existing hierarchical categories to perform an optimization and promote the final hierarchy. We select 8064 projects in Openhub community and 10703 posts in StackOverflow community as the original data and use the information of the SourceForge community as the domain knowledge. We conduct extensive experiments and evaluate our approach by utilizing Wordnet and F-measure method. The results show that our approach exhibits better performance than others with accuracy rate and recall that exceed 90%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tao, W., Huaimin, W., Gang, Y.I.N., et al.: Hierarchical categorization of open source software by online profiles. IEICE Trans. Inf. Syst. 97(9), 2386–2397 (2014)

    Google Scholar 

  2. Begelman, G.; Keller, P., Smadja, F.: Automated tag clustering: improving search and exploration in the tag space. In: Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland (2006)

    Google Scholar 

  3. Wang, S., Lo, D., Jiang, L.: Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM) (ICSM 2012), pp. 604–607. IEEE Computer Society, Washington, DC (2012)

    Google Scholar 

  4. Liu, X., Song, Y., Liu, S., Wang, H.: Automatic taxonomy construction from keywords. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 1433–1441. ACM (2012)

    Google Scholar 

  5. Heymann, P., Garcia-Molina, H.: Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, Computer Science Department, Standford University, April 2006

    Google Scholar 

  6. Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–213. ACM (1999)

    Google Scholar 

  7. Schmitz, P.: Inducing ontology from flickr tags. In: Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland, vol. 50 (2006)

    Google Scholar 

  8. Liu, K., Fang, B., Zhang, W.: Ontology emergence from folksonomies. In: Huang, J., Koudas, N., Jones, G.J.F., Wu, X., Collins-Thompson, K., An, A. (eds.) CIKM, pp. 1109–1118. ACM (2010)

    Google Scholar 

  9. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  10. Tang, J., Leung, H.-f., Luo, Q., Chen, D., Gong, J.: Towards ontology learning from folksonomies. In: Boutilier, C. (ed.) IJCAI, pp. 2089–2094 (2009)

    Google Scholar 

  11. Wang, W., Barnaghi, P.M., Bargiela, A.: Probabilistic topic models for learning terminological ontologies. IEEE Trans. Knowl. Data Eng. 22(7), 1028–1040 (2010)

    Article  Google Scholar 

  12. Itti, L., Baldi, P.: Bayesian surprise attracts human attention. In: NIPS (2005)

    Google Scholar 

  13. Li, X., Wang, H., Yin, G., Wang, T., Yang, C., Yu, Y., Tang, D.: Inducing taxonomy from tags: an agglomerative hierarchical clustering framework. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 64–77. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35527-1_6

    Chapter  Google Scholar 

  14. Gu, C., Yin, G., Wang, T., Yang, C., Wang, H.: A supervised approach for tag hierarchy construction in open source communities. In: Asia-Pacific Symposium on Internetware, pp. 148–152 (2015)

    Google Scholar 

  15. De Meo, P., Quattrone, G., Ursino, D.: Exploitation of semantic relationships and hierarchical data structures to support a user in his annotation and browsing activities in folksonomies. Inf. Syst. 34(6), 511–535 (2009)

    Article  Google Scholar 

  16. Marszałek, M., Schmid, C.: Constructing category hierarchies for visual recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 479–491. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_35

    Chapter  Google Scholar 

  17. Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via auto-tagging and hierarchical clustering. In: Carr, L., Roure, D.D., Iyengar, A., Goble, C.A., Dahlin, M. (eds.) WWW, pp. 625–632. ACM (2006)

    Google Scholar 

  18. Miller, G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  19. Almoqhim, F., Millard, D.E., Shadbolt, N.: Improving on popularity as a proxy for generality when building tag hierarchies from folksonomies. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 95–111. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_7

    Chapter  Google Scholar 

  20. Linares-Vásquez, M., McMillan, C., Poshyvanyk, D., Grechanik, M.: On using machine learning to automatically classify software applications into domain categories. Empirical Softw. Eng. 19(3), 582–618 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

Our approach is publicly-available to support further research on tag hierarchy construction and provide convenience for others to reproduce our experiment: https://github.com/Kaka727/Tag_Hierarchy_Construction_WSW.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shangwen Wang , Tao Wang or Xiaoguang Mao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Wang, T., Mao, X., Yin, G., Yu, Y. (2018). A Hybrid Approach for Tag Hierarchy Construction. In: Capilla, R., Gallina, B., Cetina, C. (eds) New Opportunities for Software Reuse. ICSR 2018. Lecture Notes in Computer Science(), vol 10826. Springer, Cham. https://doi.org/10.1007/978-3-319-90421-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90421-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90420-7

  • Online ISBN: 978-3-319-90421-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics