A Hypothesis on Word Similarity and Its Application

Jin, Peng; Qiu, Likun; Zhu, Xuefeng; Liu, Pengyuan

doi:10.1007/978-3-319-14331-6_32

Peng Jin⁶,
Likun Qiu⁷,
Xuefeng Zhu⁸ &
…
Pengyuan Liu⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8922))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

1767 Accesses
2 Citations

Abstract

A hypothesis is proposed: the semantic distance between the synonyms or near-synonyms should have the same characteristic as the distance in a metrics space. Metrics space is a set where a notion of distance (called a metric) between elements of the set is defined. At the same time, three properties should be held: (i) Identity of Indiscernibles – the distance is zero if and only if the two elements are the same. (ii) Symmetry – The distance between element A and B is equal to the distance between element B and A. (iii) Triangle Inequality – Given three elements A, B and C, the sum of any two pairs’ distance is greater or equal to the rest one. The first two properties is reasonable intuitively; as to the last one, we first get the word similarities based on HowNet and check whether the synonyms or near-synonyms listed in Cilin Extended Edition can satisfy this property. The experiments show that more than 98.5% triples (consists of three synonyms) satisfy the last property – triangle inequality. Fatherly, we detect a large quantity of thesaurus errors according to our hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lawrence, S., Pereira, F.: Aggregate and mixed-order Markov models for statistical language processing. In: Proceedings of EMNLP, pp. 81–89 (1997)
Google Scholar
Lee, L.: On the Effectiveness of the Skew Divergence for Statistical Language Analysis. In: Proceedings of Artificial Intelligence and Statistics, pp. 65–72 (2001)
Google Scholar
Chang, W., Pantel, P., Popescu, A., Gabrilovich, E.: Towards Intent-driven Bid-term Suggestion. In: Proceedings of WWW, pp. 1093–1094 (2009)
Google Scholar
Gauch, S., Chong, M.K.: Automatic Word Similarity Detection for TREC4 Query Expansion. In: Proceedings of TREC-4, pp. 527–536 (1996)
Google Scholar
Miller, G.A.: WordNet: A Lexical Database for English. Communication of ACM 38(11), 39–41 (1995)
Article Google Scholar
Mei, J., Zhu, Y., Gao, Y., Yin, H. (eds.): Tongyici Cilin [A Thesaurus of Chinese Words]. Commercial Press, Hong Kong (1984)
Google Scholar
Dong, Z., Dong, Q.: HowNet and the Computation of Meaning. World Scientific Publishing Co. Inc., River Edge (2006)
Book Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity. In: Proceedings of IJCAI, pp. 448–453 (1995)
Google Scholar
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING, pp. 19–33 (1997)
Google Scholar
Budanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006)
Article MATH Google Scholar
Pedersen, T.: WordNet::Similarity (2008). http://wn-similarity.sourceforge.net/
Liu, Q., Li, S.: Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing 17(2), 59–76 (2002)
Google Scholar
Firth, J.R.: A synopsis of linguistic theory, 1930–1955 (1957)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL, pp. 768–774 (1998)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Qiu, L., Wu, Y., Kang, Y.: Detect Thesaurus Errors Based on Distributional Similarity. Journal of Computational Information Systems 8(20), 8645–8652 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Leshan Normal University, Leshan, 614004, China
Peng Jin
School of Chinese Language and Literature, Ludong University, Yantai, 260045, China
Likun Qiu
Institute of Computational Linguistics, Peking University, Beijing, 100871, China
Xuefeng Zhu
Applied Linguistic Research Institute, Beijing Language and Culture University, Beijing, China
Pengyuan Liu

Authors

Peng Jin
View author publications
You can also search for this author in PubMed Google Scholar
Likun Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Pengyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Jin .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, Fujian, China
Xinchun Su
Central China Normal University, Wuhan, Hubei, China
Tingting He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, P., Qiu, L., Zhu, X., Liu, P. (2014). A Hypothesis on Word Similarity and Its Application. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-14331-6_32
Published: 27 December 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14330-9
Online ISBN: 978-3-319-14331-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics