Hostname: page-component-8448b6f56d-jr42d Total loading time: 0 Render date: 2024-04-24T10:08:15.941Z Has data issue: false hasContentIssue false

Weighting-based semantic similarity measure based on topological parameters in semantic taxonomy

Published online by Cambridge University Press:  04 June 2018

ABDULGABBAR SAIF
Affiliation:
Department of Computer Information Systems, Faculty of IT&CS, University of Saba Region, Marib, Yemen e-mail: agmssaif@gmail.com, aghurieb@usr.ac
UMMI ZAKIAH ZAINODIN
Affiliation:
Center for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Malaysia e-mail: ummizakiahzainodin@gmail.com, nazlia@ukm.edu.my
NAZLIA OMAR
Affiliation:
Center for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Malaysia e-mail: ummizakiahzainodin@gmail.com, nazlia@ukm.edu.my
ABDULLAH SAEED GHAREB
Affiliation:
Department of Computer Information Systems, Faculty of IT&CS, University of Saba Region, Marib, Yemen e-mail: agmssaif@gmail.com, aghurieb@usr.ac

Abstract

Semantic measures are used in handling different issues in several research areas, such as artificial intelligence, natural language processing, knowledge engineering, bioinformatics, and information retrieval. Hierarchical feature-based semantic measures have been proposed to estimate the semantic similarity between two concepts/words depending on the features extracted from a semantic taxonomy (hierarchy) of a given lexical source. The central issue in these measures is the constant weighting assumption that all elements in the semantic representation of the concept possess the same relevance. In this paper, a new weighting-based semantic similarity measure is proposed to address the issues in hierarchical feature-based measures. Four mechanisms are introduced to weigh the degree of relevance of features in the semantic representation of a concept by using topological parameters (edge, depth, descendants, and density) in a semantic taxonomy. With the semantic taxonomy of WordNet, the proposed semantic measure is evaluated for word semantic similarity in four gold-standard datasets. Experimental results show that the proposed measure outperforms hierarchical feature-based semantic measures in all the datasets. Comparison results also imply that the proposed measure is more effective than information-content measures in measuring semantic similarity.

Type
Article
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

This work was partially funded by the Ministry of Higher Education in Malaysia under the grant no. (FRGS/1/2016/ICT02/UKM/02/11). The first author would like to thank the University of Saba Region for its financial supports.

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., and Soroa, A., 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, USA. Association for Computational Linguistics, pp. 19–27.Google Scholar
Al-Mubaid, H., and Nguyen, H. A. 2006. A cluster-based approach for semantic similarity in the biomedical domain. In Proceedings of the 28th Annual International Conference of the IEEE on Engineering in Medicine and Biology Society, 2006. New York, USA, pp. 2713–7.Google Scholar
Aouicha, M. B., Taieb, M. A. H., and Ezzeddine, M., 2016. Derivation of “is a” taxonomy from Wikipedia category graph. Engineering Applications of Artificial Intelligence 50 : 265–86.Google Scholar
Banerjee, S., and Pedersen, T. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 805–10.Google Scholar
Batet, M., Sánchez, D., and Valls, A., 2011. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics 44 (1): 118–25.Google Scholar
Cross, V., Yu, X., and Hu, X., 2013. Unifying ontological similarity measures: A theoretical and empirical investigation. International Journal of Approximate Reasoning 54 (7): 861–75.Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, ACM, pp. 406–14.Google Scholar
Firth, J. R., 1957. A Synopsis of Linguistic Theory, 1930–1955. In Studies in Linguistic Analysis. Oxford: Blackwell.Google Scholar
Gabrilovich, E., and Markovitch, S. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Hyderabad, India, Morgan Kaufmann, pp. 1606–11.Google Scholar
Gabrilovich, E., and Markovitch, S., 2009. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34 (2): 443–98.Google Scholar
Gentleman, R. 2005. Visualizing and distances using GO. http://www.bioconductor.org/docs/vignettes.html.Google Scholar
Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B., 2007. Topics in semantic representation. Psychological Review 114 (2): 211–44.Google Scholar
Gurevych, I. 2005. Using the structure of a conceptual network in computing semantic relatedness. In Natural Language Processing–IJCNLP 2005, pp. 767–78. Berlin: Springer.Google Scholar
Harispe, S., Sánchez, D., Ranwez, S., Janaqi, S., and Montmain, J., 2014. A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain. Journal of Biomedical Informatics 48 : 3853.Google Scholar
Hassan, S. 2011. Measuring Semantic Relatedness Using Salient Encyclopedic Concept. PhD thesis, University of North Texas, Denton, TX, USA.Google Scholar
Hassan, S., and Mihalcea, R., 2009. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, Association for Computational Linguistics, pp. 1192–201.Google Scholar
Hassan, S., and Mihalcea, R., 2011. Semantic relatedness using salient semantic analysis. In Proceedings of AAAI 2011 (25th AAAI Conference on Artificial Intelligence), San Francisco, Association for the Advancement of Artificial Intelligence, pp. 884–9.Google Scholar
Hill, F., Reichart, R., and Korhonen, A., 2015. SimLex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics 41 (4): 665–95.Google Scholar
Jiang, J. J., and Conrath, D. W. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference Research on Computational Linguistics (ROCLING 97), Taiwan, pp. 19–33.Google Scholar
Jiang, Y., Bai, W., Zhang, X., and Hu, J. 2016. Wikipedia-based information content and semantic similarity computation. Information Processing & Management 53 (1), 248–65.Google Scholar
Jiang, Y., Zhang, X., Tang, Y., and Nie, R., 2015. Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Information Processing & Management 51 (3): 215–34.Google Scholar
Lastra-Díaz, J. J., and García-Serrano, A., 2015a. A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems 89 : 509–26.Google Scholar
Lastra-Díaz, J. J., and García-Serrano, A., 2015b. A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence 46 : 140–53.Google Scholar
Lesk, M., 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual International Conference on Systems Documentation, Toronto, Canada, ACM, pp. 24–6.Google Scholar
Meng, L., Gu, J., and Zhou, Z., 2012. A New model of information content based on concept’s topology for measuring semantic similarity in WordNet. International Journal of Grid & Distributed Computing 5 (3): 8194.Google Scholar
Miller, G. A., and Charles, W. G., 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 128.Google Scholar
Pirró, G., 2009. A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering 68 (11): 1289–308.Google Scholar
Radinsky, K., Agichtein, E., Gabrilovich, E., and Markovitch, S., 2011. A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India, ACM, pp. 337–46.Google Scholar
Rubenstein, H., and Goodenough, J. B., 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10): 627–33.Google Scholar
Saif, A., Ab Aziz, M. J., and Omar, N., 2014. Evaluating knowledge-based semantic measures on Arabic. International Journal on Communications Antenna and Propagation 4 (5): 180–94.Google Scholar
Saif, A., Ab Aziz, M. J., and Omar, N., 2016. Reducing explicit semantic representation vectors using Latent Dirichlet Allocation. Knowledge-Based Systems 100 : 145–59.Google Scholar
Saif, A., Ab Aziz, M. J., and Omar, N., 2017. Mapping Arabic WordNet synsets to Wikipedia articles using monolingual and bilingual features. Natural Language Engineering 23 (1): 5391.Google Scholar
Sánchez, D., and Batet, M., 2012. A new model to compute the information content of concepts from taxonomic knowledge. International Journal on Semantic Web and Information Systems (IJSWIS) 8 (2): 3450.Google Scholar
Sánchez, D., Batet, M., and Isern, D., 2011. Ontology-based information content computation. Knowledge-Based Systems 24 (2): 297303.Google Scholar
Sánchez, D., Batet, M., Isern, D., and Valls, A., 2012. Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications 39 (9): 7718–28.Google Scholar
Seco, N., Veale, T., and Hayes, J., 2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence, ECAI 2004, Including Prestigious Applicants of Intelligent Systems, Valencia, Spain, IOS Press, pp. 1089–90.Google Scholar
Steiger, J. H., 1980. Tests for comparing elements of a correlation matrix. Psychological Bulletin 87 (2): 245–51.Google Scholar
Sussna, M., 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the 2nd International Conference on Information and Knowledge Management, Washington, D.C., USA: ACM, pp. 67–74.Google Scholar
Taieb, H., Ben Aouicha, M., Tmar, M., and Hamadou, A. B., 2011. New information content metric and nominalization relation for a new WordNet-based method to measure the semantic relatedness. In IEEE 10th International Conference on, Cybernetic Intelligent Systems (CIS), 2011, London, UK: IEEE, pp. 51–8.Google Scholar
Taieb, M. A., Ben Aouicha, M., and Ben Hamadou, A., 2013. Computing semantic relatedness using Wikipedia features. Knowledge-Based Systems 50 : 260–78.Google Scholar
Taieb, M. A. H., Aouicha, M. B., and Hamadou, A. B., 2014. Ontology-based approach for measuring semantic similarity. Engineering Applications of Artificial Intelligence 36 : 238–61.Google Scholar
Wang, T., and Hirst, G., 2011. Refining the notions of depth and density in wordnet-based semantic similarity measures. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Edinburgh, UK, pp. 1003–11.Google Scholar
Wessa, P. 2016. Free Statistics Software, Office for Research Development and Education. http://www.wessa.net/corr.wasp, Accessed November 9, 2016.Google Scholar
Wu, X., Zhu, L., Guo, J., Zhang, D.-Y., and Lin, K., 2006. Prediction of yeast protein–protein interaction network: Insights from the gene ontology and annotations. Nucleic Acids Research 34 (7): 2137–50.Google Scholar
Yuan, Q., Yu, Z., and Wang, K., 2013. A new model of information content for measuring the semantic similarity between concepts. In International Conference on Cloud Computing and Big Data (CloudCom-Asia), 2013, Fuzhou, China: IEEE, pp. 141–6.Google Scholar
Zhang, Z., Gentile, A. L., and Ciravegna, F., 2013. Recent advances in methods of lexical semantic relatedness–a survey. Natural Language Engineering 19 (4): 411–79.Google Scholar
Zhou, Z., Wang, Y., and Gu, J., 2008. A new model of information content for semantic similarity in WordNet. In Proceedings of the 2nd International Conference on Future Generation Communication and Networking Symposia (FGCNS’08), Hainan Island, China, IEEE, pp. 85–9.Google Scholar