Skip to main content

A Hypothesis on Word Similarity and Its Application

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8922))

Included in the following conference series:

Abstract

A hypothesis is proposed: the semantic distance between the synonyms or near-synonyms should have the same characteristic as the distance in a metrics space. Metrics space is a set where a notion of distance (called a metric) between elements of the set is defined. At the same time, three properties should be held: (i) Identity of Indiscernibles – the distance is zero if and only if the two elements are the same. (ii) Symmetry – The distance between element A and B is equal to the distance between element B and A. (iii) Triangle Inequality – Given three elements A, B and C, the sum of any two pairs’ distance is greater or equal to the rest one. The first two properties is reasonable intuitively; as to the last one, we first get the word similarities based on HowNet and check whether the synonyms or near-synonyms listed in Cilin Extended Edition can satisfy this property. The experiments show that more than 98.5% triples (consists of three synonyms) satisfy the last property – triangle inequality. Fatherly, we detect a large quantity of thesaurus errors according to our hypothesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lawrence, S., Pereira, F.: Aggregate and mixed-order Markov models for statistical language processing. In: Proceedings of EMNLP, pp. 81–89 (1997)

    Google Scholar 

  2. Lee, L.: On the Effectiveness of the Skew Divergence for Statistical Language Analysis. In: Proceedings of Artificial Intelligence and Statistics, pp. 65–72 (2001)

    Google Scholar 

  3. Chang, W., Pantel, P., Popescu, A., Gabrilovich, E.: Towards Intent-driven Bid-term Suggestion. In: Proceedings of WWW, pp. 1093–1094 (2009)

    Google Scholar 

  4. Gauch, S., Chong, M.K.: Automatic Word Similarity Detection for TREC4 Query Expansion. In: Proceedings of TREC-4, pp. 527–536 (1996)

    Google Scholar 

  5. Miller, G.A.: WordNet: A Lexical Database for English. Communication of ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  6. Mei, J., Zhu, Y., Gao, Y., Yin, H. (eds.): Tongyici Cilin [A Thesaurus of Chinese Words]. Commercial Press, Hong Kong (1984)

    Google Scholar 

  7. Dong, Z., Dong, Q.: HowNet and the Computation of Meaning. World Scientific Publishing Co. Inc., River Edge (2006)

    Book  Google Scholar 

  8. Resnik, P.: Using information content to evaluate semantic similarity. In: Proceedings of IJCAI, pp. 448–453 (1995)

    Google Scholar 

  9. Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING, pp. 19–33 (1997)

    Google Scholar 

  10. Budanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  11. Pedersen, T.: WordNet::Similarity (2008). http://wn-similarity.sourceforge.net/

  12. Liu, Q., Li, S.: Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing 17(2), 59–76 (2002)

    Google Scholar 

  13. Firth, J.R.: A synopsis of linguistic theory, 1930–1955 (1957)

    Google Scholar 

  14. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL, pp. 768–774 (1998)

    Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  16. Qiu, L., Wu, Y., Kang, Y.: Detect Thesaurus Errors Based on Distributional Similarity. Journal of Computational Information Systems 8(20), 8645–8652 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Jin, P., Qiu, L., Zhu, X., Liu, P. (2014). A Hypothesis on Word Similarity and Its Application. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14331-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14330-9

  • Online ISBN: 978-3-319-14331-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics