Abstract
Open source resources are playing a more and more important role in software engineering for reuse. However, the dramatically increasing scale of these resources brings great challenges for their management and location. In this study, we propose a hybrid approach for automatic tag hierarchy construction, which combines the tag co-occurrence relations and domain knowledge to build and optimize the hierarchy. We firstly calculate the generality of each tag in accordance with the co-occurrence relationship with others, and construct the hierarchy based on the generality. Then we leverage the domain knowledge of existing hierarchical categories to perform an optimization and promote the final hierarchy. We select 8064 projects in Openhub community and 10703 posts in StackOverflow community as the original data and use the information of the SourceForge community as the domain knowledge. We conduct extensive experiments and evaluate our approach by utilizing Wordnet and F-measure method. The results show that our approach exhibits better performance than others with accuracy rate and recall that exceed 90%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tao, W., Huaimin, W., Gang, Y.I.N., et al.: Hierarchical categorization of open source software by online profiles. IEICE Trans. Inf. Syst. 97(9), 2386–2397 (2014)
Begelman, G.; Keller, P., Smadja, F.: Automated tag clustering: improving search and exploration in the tag space. In: Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland (2006)
Wang, S., Lo, D., Jiang, L.: Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM) (ICSM 2012), pp. 604–607. IEEE Computer Society, Washington, DC (2012)
Liu, X., Song, Y., Liu, S., Wang, H.: Automatic taxonomy construction from keywords. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 1433–1441. ACM (2012)
Heymann, P., Garcia-Molina, H.: Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, Computer Science Department, Standford University, April 2006
Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–213. ACM (1999)
Schmitz, P.: Inducing ontology from flickr tags. In: Collaborative Web Tagging Workshop at WWW 2006, Edinburgh, Scotland, vol. 50 (2006)
Liu, K., Fang, B., Zhang, W.: Ontology emergence from folksonomies. In: Huang, J., Koudas, N., Jones, G.J.F., Wu, X., Collins-Thompson, K., An, A. (eds.) CIKM, pp. 1109–1118. ACM (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Tang, J., Leung, H.-f., Luo, Q., Chen, D., Gong, J.: Towards ontology learning from folksonomies. In: Boutilier, C. (ed.) IJCAI, pp. 2089–2094 (2009)
Wang, W., Barnaghi, P.M., Bargiela, A.: Probabilistic topic models for learning terminological ontologies. IEEE Trans. Knowl. Data Eng. 22(7), 1028–1040 (2010)
Itti, L., Baldi, P.: Bayesian surprise attracts human attention. In: NIPS (2005)
Li, X., Wang, H., Yin, G., Wang, T., Yang, C., Yu, Y., Tang, D.: Inducing taxonomy from tags: an agglomerative hierarchical clustering framework. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 64–77. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35527-1_6
Gu, C., Yin, G., Wang, T., Yang, C., Wang, H.: A supervised approach for tag hierarchy construction in open source communities. In: Asia-Pacific Symposium on Internetware, pp. 148–152 (2015)
De Meo, P., Quattrone, G., Ursino, D.: Exploitation of semantic relationships and hierarchical data structures to support a user in his annotation and browsing activities in folksonomies. Inf. Syst. 34(6), 511–535 (2009)
Marszałek, M., Schmid, C.: Constructing category hierarchies for visual recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 479–491. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_35
Brooks, C.H., Montanez, N.: Improved annotation of the blogosphere via auto-tagging and hierarchical clustering. In: Carr, L., Roure, D.D., Iyengar, A., Goble, C.A., Dahlin, M. (eds.) WWW, pp. 625–632. ACM (2006)
Miller, G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Almoqhim, F., Millard, D.E., Shadbolt, N.: Improving on popularity as a proxy for generality when building tag hierarchies from folksonomies. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 95–111. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13734-6_7
Linares-Vásquez, M., McMillan, C., Poshyvanyk, D., Grechanik, M.: On using machine learning to automatically classify software applications into domain categories. Empirical Softw. Eng. 19(3), 582–618 (2014)
Acknowledgement
Our approach is publicly-available to support further research on tag hierarchy construction and provide convenience for others to reproduce our experiment: https://github.com/Kaka727/Tag_Hierarchy_Construction_WSW.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wang, S., Wang, T., Mao, X., Yin, G., Yu, Y. (2018). A Hybrid Approach for Tag Hierarchy Construction. In: Capilla, R., Gallina, B., Cetina, C. (eds) New Opportunities for Software Reuse. ICSR 2018. Lecture Notes in Computer Science(), vol 10826. Springer, Cham. https://doi.org/10.1007/978-3-319-90421-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-90421-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90420-7
Online ISBN: 978-3-319-90421-4
eBook Packages: Computer ScienceComputer Science (R0)