Skip to main content

Evaluation of Embedded Vectors for Lexemes and Synsets Toward Expansion of Japanese WordNet

  • Conference paper
  • First Online:
  • 660 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1215))

Abstract

In this paper, we discuss the possibility to expand Japanese WordNet using AutoExtend that can produce embedded vectors based on dictionary structure. Recently several kinds of NLP tasks showed that the distributed representations for words are effective, however, the word-embedded vectors constructed based on contexts of surrounded words would be difficult to discriminate meanings of a word because every vector is produced for a word. On the other hand, AutoExtend that can produce embedded vectors for meanings and concepts as well as words taking into account thesaurus structure of dictionary, has been proposed and applied into English WordNet. Thus, in this paper, we apply AutoExtend into a Japanese dictionary i.e., Japanese WordNet to construct embedded vectors for lexems and synsets as well as words taking into account thesaurus structure of Japanese WordNet. The experimental results show that embedded vectors constructed by AutoExtend can be helpful to find corresponding meanings for unregistered words in the dictionary.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://framenet.icsi.berkeley.edu/fndrupal/.

  2. 2.

    http://verbs.colorado.edu/~mpalmer/projects/verbnet.html.

  3. 3.

    http://pth.cl.cs.okayama-u.ac.jp/.

References

  1. Asahara, M.: NWJC2Vec: word embedding dataset from ‘NINJAL Web Japanese Corpus’. Terminol. Int. J. Theor. Appl. Issues Spec. Commun. 24(2), 7–25 (2018)

    Article  Google Scholar 

  2. Asahara, M., Maekawa, K., Imada, M., Kato, S., Konishi, H.: Archiving and analysing techniques of the ultra-large-scale web-based corpus project of NINJAL, Japan. Alexandria 26(1–2), 129–148 (2014)

    Article  Google Scholar 

  3. Bentivogli, L., Pianta, E.: Extending wordnet with syntagmatic information. In: Proceedings of The Second Global WordNet Conference, pp. 47–53 (2004)

    Google Scholar 

  4. Fišer, D.: Leveraging parallel corpora and existing wordnets for automatic construction of the slovene wordnet. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS (LNAI), vol. 5603, pp. 359–368. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04235-5_31

    Chapter  Google Scholar 

  5. Fujita, S., Tanaka, T., Bond, F., Nakaiwa, H.: An implemented description of Japanese: the Lexeed dictionary and the Hinoki treebank. In: COLING/ACL06 Interactive Presentation Sessions, pp. 65–68 (2006)

    Google Scholar 

  6. Isahara, H., Bond, F., Uchimoto, K., Utiyama, M., Kanzaki, K.: Development of the Japanese WordNet. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, pp. 2420–2423 (2008)

    Google Scholar 

  7. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: FastText.zip: compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)

  8. Lally, A., Prager, J.M., et al.: Question analysis: how Watson reads a clue. IBM J. Res. Dev. 56(34), 2:1–2:14 (2012)

    Article  Google Scholar 

  9. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR (2013). http://arxiv.org/abs/1301.3781

  10. Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. (CSUR) 41(2), 1–69 (2009)

    Article  Google Scholar 

  11. Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: an annotated corpus of semantic roles. Comput. Linguist. 31(1), 71–105 (2005)

    Article  Google Scholar 

  12. Palmer, M., Gildea, D., Xue, N.: Semantic Role Labeling. Morgan & Claypool Publishers, San Rafael (2010)

    Book  Google Scholar 

  13. Rothe, S., Schütze, H.: AutoExtend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of the Association for Computational Linguistics (2015)

    Google Scholar 

Download references

Acknowledgment

A part of the research reported in this paper is supported by JSPS KAKENHI (JP19K00552) and the NINJAL project “Development of and Research with a parsed corpus of Japanese” by JSPS KAKENHI (JP15H03210).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daiki Ko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ko, D., Takeuchi, K. (2020). Evaluation of Embedded Vectors for Lexemes and Synsets Toward Expansion of Japanese WordNet. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6168-9_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6167-2

  • Online ISBN: 978-981-15-6168-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics