Abstract
Chinese word similarity calculation is a key technique in Chinese information processing. The most widely used word-based similarity calculations often fail to detect subtle differences between two words. This can lead to grossing mis-estimation of the similarity between two words. In this paper, we propose a new method to calculate the similarity between two Chinese words with a particular focus on comparing pairs of words which are very similar in meaning. A hybrid combination strategy is formulated incorporating other similarity calculations for scenarios between these two extreme conditions. Different corpora and models are used to train the proposed method, then combining with the score obtained from the Hownet and the final similarity value is refined accordingly. This model makes an important improvement to the existing strategies. Experiments on very similar words were conducted with two evaluation metrics, the Spearman and Pearson rank correlation coefficients. Our final results are 0.427/0.421 which outperforms the existing state-of-the-art models. It clearly shows the effectiveness of the proposed method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Mihalcea, R., Corley, C., Strapparava, C., et al.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)
Gan, M., Dou, X., Jiang, R.: From ontology to semantic similarity: calculation of ontology-based semantic similarity. Sci. World J. 2013, 11 (2013)
Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. arXiv preprint arXiv:1507.01127 (2015)
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166 (2014)
Dai, L., Liu, B., Xia, Y., Wu, S.: Measuring semantic similarity between words using HowNet. In: 2008 International Conference on Computer Science and Information Technology, ICCSIT 2008, pp. 601–605. IEEE (2008)
Zhu, Y.-L., Min, J., Zhou, Y., Huang, X., Li-De, W.: Semantic orientation computing based on HowNet. J. Chin. Inf. Process. 20(1), 14–20 (2006)
Liu, Q., Li, S.: Word similarity computing based on How-net. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)
Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of sina weibo with word2vec. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 358–363. IEEE (2014)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems, pp. 2177–2185 (2014)
Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: ACL (2), pp. 302–308. Citeseer (2014)
Tian, J., Zhao, W.: Words similarity algorithm based on Tongyici cilin in semantic web adaptive learning system. J. Jilin Univ. (Inf. Sci. Ed.) 28(6), 602–608 (2010)
Wu, Y., Li, W.: Overview of the NLPCC-ICCPOL 2016 shared task: chinese word similarity measurement. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC-2016. LNCS, vol. 10102, pp. 828–839. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_75
Fillmore, C.J., Wooters, C., Baker, C.F.: Building a large lexical databank which provides deep semantics. Citeseer (2001)
Rong, X.: word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)
Ono, M., Miwa, M., Sasaki, Y.: Word embedding-based antonym detection using thesauri and distributional information. In: HLT-NAACL, pp. 984–989 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Du, P., Chen, S., Xu, X., Li, L. (2017). Word Similarity Computation with Extreme-Similar Method. In: Song, S., Renz, M., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10612. Springer, Cham. https://doi.org/10.1007/978-3-319-69781-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-69781-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69780-2
Online ISBN: 978-3-319-69781-9
eBook Packages: Computer ScienceComputer Science (R0)