Abstract
A hypothesis is proposed: the semantic distance between the synonyms or near-synonyms should have the same characteristic as the distance in a metrics space. Metrics space is a set where a notion of distance (called a metric) between elements of the set is defined. At the same time, three properties should be held: (i) Identity of Indiscernibles – the distance is zero if and only if the two elements are the same. (ii) Symmetry – The distance between element A and B is equal to the distance between element B and A. (iii) Triangle Inequality – Given three elements A, B and C, the sum of any two pairs’ distance is greater or equal to the rest one. The first two properties is reasonable intuitively; as to the last one, we first get the word similarities based on HowNet and check whether the synonyms or near-synonyms listed in Cilin Extended Edition can satisfy this property. The experiments show that more than 98.5% triples (consists of three synonyms) satisfy the last property – triangle inequality. Fatherly, we detect a large quantity of thesaurus errors according to our hypothesis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lawrence, S., Pereira, F.: Aggregate and mixed-order Markov models for statistical language processing. In: Proceedings of EMNLP, pp. 81–89 (1997)
Lee, L.: On the Effectiveness of the Skew Divergence for Statistical Language Analysis. In: Proceedings of Artificial Intelligence and Statistics, pp. 65–72 (2001)
Chang, W., Pantel, P., Popescu, A., Gabrilovich, E.: Towards Intent-driven Bid-term Suggestion. In: Proceedings of WWW, pp. 1093–1094 (2009)
Gauch, S., Chong, M.K.: Automatic Word Similarity Detection for TREC4 Query Expansion. In: Proceedings of TREC-4, pp. 527–536 (1996)
Miller, G.A.: WordNet: A Lexical Database for English. Communication of ACM 38(11), 39–41 (1995)
Mei, J., Zhu, Y., Gao, Y., Yin, H. (eds.): Tongyici Cilin [A Thesaurus of Chinese Words]. Commercial Press, Hong Kong (1984)
Dong, Z., Dong, Q.: HowNet and the Computation of Meaning. World Scientific Publishing Co. Inc., River Edge (2006)
Resnik, P.: Using information content to evaluate semantic similarity. In: Proceedings of IJCAI, pp. 448–453 (1995)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING, pp. 19–33 (1997)
Budanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006)
Pedersen, T.: WordNet::Similarity (2008). http://wn-similarity.sourceforge.net/
Liu, Q., Li, S.: Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing 17(2), 59–76 (2002)
Firth, J.R.: A synopsis of linguistic theory, 1930–1955 (1957)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL, pp. 768–774 (1998)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: Proceedings of Workshop at ICLR (2013)
Qiu, L., Wu, Y., Kang, Y.: Detect Thesaurus Errors Based on Distributional Similarity. Journal of Computational Information Systems 8(20), 8645–8652 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jin, P., Qiu, L., Zhu, X., Liu, P. (2014). A Hypothesis on Word Similarity and Its Application. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-14331-6_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14330-9
Online ISBN: 978-3-319-14331-6
eBook Packages: Computer ScienceComputer Science (R0)