Word Similarity Computation with Extreme-Similar Method

Du, Peiwen; Chen, Siding; Xu, Xiaofei; Li, Li

doi:10.1007/978-3-319-69781-9_6

Word Similarity Computation with Extreme-Similar Method

Peiwen Du¹⁶,
Siding Chen¹⁶,
Xiaofei Xu¹⁶ &
…
Li Li¹⁶

Conference paper
First Online: 08 November 2017

1050 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10612))

Abstract

Chinese word similarity calculation is a key technique in Chinese information processing. The most widely used word-based similarity calculations often fail to detect subtle differences between two words. This can lead to grossing mis-estimation of the similarity between two words. In this paper, we propose a new method to calculate the similarity between two Chinese words with a particular focus on comparing pairs of words which are very similar in meaning. A hybrid combination strategy is formulated incorporating other similarity calculations for scenarios between these two extreme conditions. Different corpora and models are used to train the proposed method, then combining with the score obtained from the Hownet and the final similarity value is refined accordingly. This model makes an important improvement to the existing strategies. Experiments on very similar words were conducted with two evaluation metrics, the Spearman and Pearson rank correlation coefficients. Our final results are 0.427/0.421 which outperforms the existing state-of-the-art models. It clearly shows the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Mihalcea, R., Corley, C., Strapparava, C., et al.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)
Google Scholar
Gan, M., Dou, X., Jiang, R.: From ontology to semantic similarity: calculation of ontology-based semantic similarity. Sci. World J. 2013, 11 (2013)
Article Google Scholar
Rothe, S., Schütze, H.: Autoextend: extending word embeddings to embeddings for synsets and lexemes. arXiv preprint arXiv:1507.01127 (2015)
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166 (2014)
Dai, L., Liu, B., Xia, Y., Wu, S.: Measuring semantic similarity between words using HowNet. In: 2008 International Conference on Computer Science and Information Technology, ICCSIT 2008, pp. 601–605. IEEE (2008)
Google Scholar
Zhu, Y.-L., Min, J., Zhou, Y., Huang, X., Li-De, W.: Semantic orientation computing based on HowNet. J. Chin. Inf. Process. 20(1), 14–20 (2006)
Google Scholar
Liu, Q., Li, S.: Word similarity computing based on How-net. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)
Google Scholar
Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of sina weibo with word2vec. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 358–363. IEEE (2014)
Google Scholar
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in neural information processing systems, pp. 2177–2185 (2014)
Google Scholar
Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al’.s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Levy, O., Goldberg, Y.: Dependency-based word embeddings. In: ACL (2), pp. 302–308. Citeseer (2014)
Google Scholar
Tian, J., Zhao, W.: Words similarity algorithm based on Tongyici cilin in semantic web adaptive learning system. J. Jilin Univ. (Inf. Sci. Ed.) 28(6), 602–608 (2010)
Google Scholar
Wu, Y., Li, W.: Overview of the NLPCC-ICCPOL 2016 shared task: chinese word similarity measurement. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC-2016. LNCS, vol. 10102, pp. 828–839. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_75
Chapter Google Scholar
Fillmore, C.J., Wooters, C., Baker, C.F.: Building a large lexical databank which provides deep semantics. Citeseer (2001)
Google Scholar
Rong, X.: word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)
Ono, M., Miwa, M., Sasaki, Y.: Word embedding-based antonym detection using thesauri and distributional information. In: HLT-NAACL, pp. 984–989 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Science, Southwest University, Chongqing, 400715, China
Peiwen Du, Siding Chen, Xiaofei Xu & Li Li

Authors

Peiwen Du
View author publications
You can also search for this author in PubMed Google Scholar
Siding Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Li Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Shaoxu Song
George Mason University, Fairfax, Virginia, USA
Matthias Renz
Kangwon National University, Chuncheon, Korea (Republic of)
Yang-Sae Moon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, P., Chen, S., Xu, X., Li, L. (2017). Word Similarity Computation with Extreme-Similar Method. In: Song, S., Renz, M., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10612. Springer, Cham. https://doi.org/10.1007/978-3-319-69781-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-69781-9_6
Published: 08 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69780-2
Online ISBN: 978-3-319-69781-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics