Skip to main content

Gathering Information About Word Similarity from Neighbor Sentences

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

Abstract

In this paper we present the first results of detecting word semantic similarity on the Russian translations of Miller-Charles and Rubenstein-Goodenough sets prepared for the first Russian word semantic evaluation Russe-2015. The experiments were carried out on three text collections: Russian Wikipedia, a news collection, and their united collection. We found that the best results in detection of lexical paradigmatic relations are achieved using the combination of word2vec with the new type of features based on word co-occurrences in neighbor sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://russe.nlpub.ru/.

  2. 2.

    https://code.google.com/archive/p/word2vec/.

References

  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 19–27, May 2009

    Google Scholar 

  2. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of ACL-2014, pp. 238–247 (2014)

    Google Scholar 

  3. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Advances in Automatic Text Summarization, pp. 111–121 (1999)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  6. Fernandes, E.R., dos Santos, C.N., Milidiú, R.L.: Latent trees for coreference resolution. Comput. Linguist. 40, 801–835 (2014)

    Article  Google Scholar 

  7. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414 (2001)

    Google Scholar 

  8. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of IJCAI, pp. 6–12 (2007)

    Google Scholar 

  9. Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing, Jeju Island, South Korea, pp. 767–778 (2005)

    Google Scholar 

  10. Halliday, M., Hasan, R.: Cohesion in English. Routledge, London (2014)

    Google Scholar 

  11. Hassan, S., Mihalcea, R.: Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, vol. 3, pp. 1192–1201 (2009)

    Google Scholar 

  12. Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: WordNet: An Electronic Lexical Database, pp. 305–332 (1998)

    Google Scholar 

  13. Kutuzov, A., Kuzmenko, E.: Comparing neural lexical models of a classic national corpus and a web corpus: the case for Russian. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 47–58. Springer, Heidelberg (2015)

    Google Scholar 

  14. Lapesa, G., Evert, S.: A large scale evaluation of distributional semantic models: parameters, interactions and model selection. Trans. Assoc. Comput. Linguist. 2, 531–545 (2014)

    Google Scholar 

  15. Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)

    Google Scholar 

  16. Lopukhin K.A., Lopukhina A.A., Nosyrev G.V.: The impact of different vector space models and supplementary techniques on Russian semantic similarity task. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference, Dialogue, vol. 2, pp. 145–153 (2015)

    Google Scholar 

  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  18. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)

    Article  Google Scholar 

  19. Panchenko, A., Loukachevitch, N., Ustalov, D., Paperno, D., Meyer, C., Konstantinova, N.: RUSSE: the first workshop on Russian semantic similarity. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual Conference, Dialogue, vol. 2, pp. 89–105 (2015)

    Google Scholar 

  20. Postma, M., Vossen, P.: What implementation and translation teach us: the case of semantic similarity measures in wordnets. In: Proceedings of Global Word-Net Conference GWC-2014, Tartu, Estonia, pp. 133–141 (2014)

    Google Scholar 

  21. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  22. Sahlgren, M.: The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in highdimensional vector spaces. Ph.D. thesis, University of Stockolm (2006)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by Russian Foundation for Basic Research, grant N14-07-00383.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Loukachevitch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Loukachevitch, N., Alekseev, A. (2016). Gathering Information About Word Similarity from Neighbor Sentences. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics