Skip to main content

Domain Adaptation for Word Sense Disambiguation Using Word Embeddings

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Abstract

In this paper, we propose domain adaptation in word sense disambiguation (WSD) using word embeddings. The validity of the word embeddings from a huge corpus, e.g., Wikipedia, for WSD had already been shown, but their validity in a domain adaptation framework has not been discussed before. In addition, if they are valid, the difference in effects according to the domain of the corpora is still unknown. Therefore, we investigate the performances of domain adaptation in WSD using the word embeddings from the source, target, and general corpora and examine (1) whether the word embeddings are valid for domain adaptation of WSD and (2) if they are, the effects in accordance with the domain of the corpora. The experiments using Japanese corpora revealed that the accuracy of WSD was highest when we used the word embeddings obtained from the target corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Sugawara et al. [31] reported that Context-Word-Embeddings improved the result of WSD more than Average-Word-Embeddings, which was the average of vector representations of words in the context window.

  2. 2.

    https://code.google.com/archive/p/word2vec/.

  3. 3.

    We used the -b option of libsvm.

  4. 4.

    https://github.com/jordwest/mecab-docs-en.

  5. 5.

    http://sourceforge.net/projects/cabocha/.

  6. 6.

    SemEval-2010 Task: Japanese WSD [26] is included in this corpus.

  7. 7.

    https://dumps.wikimedia.org/jawiki/.

  8. 8.

    The ratios of the numbers of tokens for corpora (1)–(5) are less than two percent to that of Wikipedia respectively and the ratio of corpus (6) is about 46%.

  9. 9.

    Note that we cannot know the most frequent sense in the target corpus without the labeled target data and it is hard to beat [27].

References

  1. Agirre, E., de Lacalle, O.L.: On robustness and domain adaptation using svd for word sense disambiguation. In: Proceedings of COLING 2008, pp. 17–24 (2008)

    Google Scholar 

  2. Agirre, E., de Lacalle, O.L.: Supervised domain adaption for WSD. In: Proceedings of EACL 2009, pp. 42–50 (2009)

    Google Scholar 

  3. Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of EMNLP 2006, pp. 120–128 (2006)

    Google Scholar 

  4. Chan, Y.S., Ng, H.T.: Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of COLING-ACL 2006, pp. 89–96 (2006)

    Google Scholar 

  5. Chan, Y.S., Ng, H.T.: Domain adaptation with active learning for word sense disambiguation. In: Proceedings of ACL 2007, pp. 49–56 (2007)

    Google Scholar 

  6. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Chen, T., Xu, R., He, Y., Wang, X.: Improving distributed representation of word sense via wordnet gloss composition and context clustering. In: Proceedings of ACL-IJCNLP 2015, pp. 15–20 (2015)

    Google Scholar 

  8. Clinchant, S., Csurka, G., Chidlovskii, B.: A domain adaptation regularization for denoising autoencoders. In: Proceedings of ACL 2016, pp. 26–31 (2016)

    Google Scholar 

  9. Daumé III, H.: Frustratingly easy domain adaptation. In: Proceedings of ACL 2007, pp. 256–263 (2007)

    Google Scholar 

  10. Daumé III, H., Kumar, A., Saha, A.: Frustratingly easy semi-supervised domain adaptation. In: Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010, pp. 23–59 (2010)

    Google Scholar 

  11. Escudero, G., rquez, L.M., Rigau, G.: An empirical study of the domain dependence of supervised word sense disambiguation systems. In: Proceedings of EMNLP/VLC 2000, pp. 172–180 (2000)

    Google Scholar 

  12. Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd ICML, pp. 1180–1189 (2015)

    Google Scholar 

  13. Hashida, K., Isahara, H., Tokunaga, T., Hashimoto, M., Ogino, S., Kashino, W.: The RWC text databases. In: Proceedings of the First International Conference on Language Resource and Evaluation, pp. 457–461 (1998)

    Google Scholar 

  14. Izquierd, R., Suárez, A., Rigau, G.: Word vs. class-based word sense disambiguation. J. Artif. Intell. Res. 54, 83–122 (2015)

    Article  MathSciNet  Google Scholar 

  15. Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of ACL 2007, pp. 264–271 (2007)

    Google Scholar 

  16. Komiya, K., Okumura, M.: Automatic determination of a domain adaptation method for word sense disambiguation using decision tree learning. In: Proceedings of IJCNLP 2011, pp. 1107–1115 (2011)

    Google Scholar 

  17. Komiya, K., Okumura, M.: Automatic domain adaptation for word sense disambiguation based on comparison of multiple classifiers. In: PACLIC 2012, pp. 77–85 (2012)

    Google Scholar 

  18. Kouno, K., Shinnou, H., Sasaki, M., Komiya, K.: Unsupervised domain adaptation for word sense disambiguation using stacked denoising autoencoder. In: Proceedings of PACLIC-29, pp. 224–231 (2015)

    Google Scholar 

  19. Kunii, S., Shinnou, H.: Combined use of topic models on unsupervised domain adaptation for word sense disambiguation. In: Proceedings of PACLIC-27, pp. 224–231 (2013)

    Google Scholar 

  20. Maekawa, K.: Balanced corpus of contemporary written japanese. In: Proceedings of the 6th Workshop on Asian Language Resources (ALR), pp. 101–102 (2008)

    Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. IN: Proceedings of ICLR Workshop 2013, pp. 1–12 (2013)

    Google Scholar 

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Proceedings of NIPS 2013, pp. 1–9 (2013)

    Google Scholar 

  23. Mikolov, T., tau Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL 2013, pp. 746–751 (2013)

    Google Scholar 

  24. National Institute for Japanese Language: Linguistics: Word List by Semantic Principles. Shuuei Shuppan (1964) (in Japanese)

    Google Scholar 

  25. Nishio, M., Iwabuchi, E., Mizutani, S.: Iwanami Kokugo Jiten Dai Go Han. Iwanami Publisher (1994) (in Japanese)

    Google Scholar 

  26. Okumura, M., Shirai, K., Komiya, K., Yokono, H.: Semeval-2010 task: Japanese WSD. In: Proceedings of the SemEval-2010, ACL 2010, pp. 69–74 (2010)

    Google Scholar 

  27. Postma, M., Izquierdo, R., Agirre, E., Rigau, G., Vossen, P.: Addressing the MFS bias in WSD systems. In: Proceedings of the 10th Language Resources and Evaluation Conference, LREC 2016, pp. 1695–1700 (2016)

    Google Scholar 

  28. Rothe, S., Schutze, H.: Autoextend: Extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL 2015, pp. 1793–1803 (2015)

    Google Scholar 

  29. Shinnou, H., Onodera, Y., Sasaki, M., Komiya, K.: Active learning to remove source instances for domain adaptation for word sense disambiguation. In: Proceedings of PACLING-2015, pp. 224–231 (2015)

    Google Scholar 

  30. Shinnou, H., Sasaki, M., Komiya, K.: Learning under covariate shift for domain adaptation for word sense disambiguation. In: Proceedings of PACLIC-29, pp. 215–223 (2015)

    Google Scholar 

  31. Sugawara, H., Takamura, H., Sasano, R., Okumura, M.: Context representation with word embeddings for WSD. In: Proceedings of PACLING 2015 (2015)

    Google Scholar 

  32. Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of AAAI-16, pp. 2058–2065 (2016)

    Google Scholar 

  33. Taghipour, K., Ng, H.T.: Semi-supervised word sense disambiguation using word embeddings in general and specific domains. In: Proceedings of NAACL-HLT 2015, pp. 314–323 (2015)

    Google Scholar 

  34. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of ACL 2014, pp. 1555–1565 (2014)

    Google Scholar 

  35. Vu, T., Parker, D.S.: K-embeddings: Learning conceptual embeddings for words using context. In: Proceedings of NAACL-HLT 2016, pp. 1262–1267 (2016)

    Google Scholar 

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 15K16046.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kanako Komiya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Komiya, K., Suzuki, S., Sasaki, M., Shinnou, H., Okumura, M. (2018). Domain Adaptation for Word Sense Disambiguation Using Word Embeddings. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77113-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77112-0

  • Online ISBN: 978-3-319-77113-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics