Skip to main content

Word Similarity Computation with Extreme-Similar Method

  • Conference paper
  • First Online:
  • 1050 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10612))

Abstract

Chinese word similarity calculation is a key technique in Chinese information processing. The most widely used word-based similarity calculations often fail to detect subtle differences between two words. This can lead to grossing mis-estimation of the similarity between two words. In this paper, we propose a new method to calculate the similarity between two Chinese words with a particular focus on comparing pairs of words which are very similar in meaning. A hybrid combination strategy is formulated incorporating other similarity calculations for scenarios between these two extreme conditions. Different corpora and models are used to train the proposed method, then combining with the score obtained from the Hownet and the final similarity value is refined accordingly. This model makes an important improvement to the existing strategies. Experiments on very similar words were conducted with two evaluation metrics, the Spearman and Pearson rank correlation coefficients. Our final results are 0.427/0.421 which outperforms the existing state-of-the-art models. It clearly shows the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.keenage.com/zhiwang/e_zhiwang_r.html.

  2. 2.

    http://github.com/fxsjy/jieba.

References

  1. Mihalcea, R., Corley, C., Strapparava, C., et al.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)

    Google Scholar 

  2. Gan, M., Dou, X., Jiang, R.: From ontology to semantic similarity: calculation of ontology-based semantic similarity. Sci. World J. 2013, 11 (2013)

    Article  Google Scholar 

  3. Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. arXiv preprint arXiv:1507.01127 (2015)

  4. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166 (2014)

  5. Dai, L., Liu, B., Xia, Y., Wu, S.: Measuring semantic similarity between words using HowNet. In: 2008 International Conference on Computer Science and Information Technology, ICCSIT 2008, pp. 601–605. IEEE (2008)

    Google Scholar 

  6. Zhu, Y.-L., Min, J., Zhou, Y., Huang, X., Li-De, W.: Semantic orientation computing based on HowNet. J. Chin. Inf. Process. 20(1), 14–20 (2006)

    Google Scholar 

  7. Liu, Q., Li, S.: Word similarity computing based on How-net. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)

    Google Scholar 

  8. Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of sina weibo with word2vec. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 358–363. IEEE (2014)

    Google Scholar 

  9. Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems, pp. 2177–2185 (2014)

    Google Scholar 

  10. Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)

  11. Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: ACL (2), pp. 302–308. Citeseer (2014)

    Google Scholar 

  12. Tian, J., Zhao, W.: Words similarity algorithm based on Tongyici cilin in semantic web adaptive learning system. J. Jilin Univ. (Inf. Sci. Ed.) 28(6), 602–608 (2010)

    Google Scholar 

  13. Wu, Y., Li, W.: Overview of the NLPCC-ICCPOL 2016 shared task: chinese word similarity measurement. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC-2016. LNCS, vol. 10102, pp. 828–839. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_75

    Chapter  Google Scholar 

  14. Fillmore, C.J., Wooters, C., Baker, C.F.: Building a large lexical databank which provides deep semantics. Citeseer (2001)

    Google Scholar 

  15. Rong, X.: word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)

  16. Ono, M., Miwa, M., Sasaki, Y.: Word embedding-based antonym detection using thesauri and distributional information. In: HLT-NAACL, pp. 984–989 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, P., Chen, S., Xu, X., Li, L. (2017). Word Similarity Computation with Extreme-Similar Method. In: Song, S., Renz, M., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10612. Springer, Cham. https://doi.org/10.1007/978-3-319-69781-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69781-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69780-2

  • Online ISBN: 978-3-319-69781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics