Domain Adaptation for Word Sense Disambiguation Using Word Embeddings

Komiya, Kanako; Suzuki, Shota; Sasaki, Minoru; Shinnou, Hiroyuki; Okumura, Manabu

doi:10.1007/978-3-319-77113-7_16

Kanako Komiya¹⁴,
Shota Suzuki¹⁴,
Minoru Sasaki¹⁴,
Hiroyuki Shinnou¹⁴ &
…
Manabu Okumura¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

881 Accesses
2 Citations

Abstract

In this paper, we propose domain adaptation in word sense disambiguation (WSD) using word embeddings. The validity of the word embeddings from a huge corpus, e.g., Wikipedia, for WSD had already been shown, but their validity in a domain adaptation framework has not been discussed before. In addition, if they are valid, the difference in effects according to the domain of the corpora is still unknown. Therefore, we investigate the performances of domain adaptation in WSD using the word embeddings from the source, target, and general corpora and examine (1) whether the word embeddings are valid for domain adaptation of WSD and (2) if they are, the effects in accordance with the domain of the corpora. The experiments using Japanese corpora revealed that the accuracy of WSD was highest when we used the word embeddings obtained from the target corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Sugawara et al. [31] reported that Context-Word-Embeddings improved the result of WSD more than Average-Word-Embeddings, which was the average of vector representations of words in the context window.
2.
https://code.google.com/archive/p/word2vec/.
3.
We used the -b option of libsvm.
4.
https://github.com/jordwest/mecab-docs-en.
5.
http://sourceforge.net/projects/cabocha/.
6.
SemEval-2010 Task: Japanese WSD [26] is included in this corpus.
7.
https://dumps.wikimedia.org/jawiki/.
8.
The ratios of the numbers of tokens for corpora (1)–(5) are less than two percent to that of Wikipedia respectively and the ratio of corpus (6) is about 46%.
9.
Note that we cannot know the most frequent sense in the target corpus without the labeled target data and it is hard to beat [27].

References

Agirre, E., de Lacalle, O.L.: On robustness and domain adaptation using svd for word sense disambiguation. In: Proceedings of COLING 2008, pp. 17–24 (2008)
Google Scholar
Agirre, E., de Lacalle, O.L.: Supervised domain adaption for WSD. In: Proceedings of EACL 2009, pp. 42–50 (2009)
Google Scholar
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of EMNLP 2006, pp. 120–128 (2006)
Google Scholar
Chan, Y.S., Ng, H.T.: Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of COLING-ACL 2006, pp. 89–96 (2006)
Google Scholar
Chan, Y.S., Ng, H.T.: Domain adaptation with active learning for word sense disambiguation. In: Proceedings of ACL 2007, pp. 49–56 (2007)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, T., Xu, R., He, Y., Wang, X.: Improving distributed representation of word sense via wordnet gloss composition and context clustering. In: Proceedings of ACL-IJCNLP 2015, pp. 15–20 (2015)
Google Scholar
Clinchant, S., Csurka, G., Chidlovskii, B.: A domain adaptation regularization for denoising autoencoders. In: Proceedings of ACL 2016, pp. 26–31 (2016)
Google Scholar
Daumé III, H.: Frustratingly easy domain adaptation. In: Proceedings of ACL 2007, pp. 256–263 (2007)
Google Scholar
Daumé III, H., Kumar, A., Saha, A.: Frustratingly easy semi-supervised domain adaptation. In: Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010, pp. 23–59 (2010)
Google Scholar
Escudero, G., rquez, L.M., Rigau, G.: An empirical study of the domain dependence of supervised word sense disambiguation systems. In: Proceedings of EMNLP/VLC 2000, pp. 172–180 (2000)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd ICML, pp. 1180–1189 (2015)
Google Scholar
Hashida, K., Isahara, H., Tokunaga, T., Hashimoto, M., Ogino, S., Kashino, W.: The RWC text databases. In: Proceedings of the First International Conference on Language Resource and Evaluation, pp. 457–461 (1998)
Google Scholar
Izquierd, R., Suárez, A., Rigau, G.: Word vs. class-based word sense disambiguation. J. Artif. Intell. Res. 54, 83–122 (2015)
Article MathSciNet Google Scholar
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of ACL 2007, pp. 264–271 (2007)
Google Scholar
Komiya, K., Okumura, M.: Automatic determination of a domain adaptation method for word sense disambiguation using decision tree learning. In: Proceedings of IJCNLP 2011, pp. 1107–1115 (2011)
Google Scholar
Komiya, K., Okumura, M.: Automatic domain adaptation for word sense disambiguation based on comparison of multiple classifiers. In: PACLIC 2012, pp. 77–85 (2012)
Google Scholar
Kouno, K., Shinnou, H., Sasaki, M., Komiya, K.: Unsupervised domain adaptation for word sense disambiguation using stacked denoising autoencoder. In: Proceedings of PACLIC-29, pp. 224–231 (2015)
Google Scholar
Kunii, S., Shinnou, H.: Combined use of topic models on unsupervised domain adaptation for word sense disambiguation. In: Proceedings of PACLIC-27, pp. 224–231 (2013)
Google Scholar
Maekawa, K.: Balanced corpus of contemporary written japanese. In: Proceedings of the 6th Workshop on Asian Language Resources (ALR), pp. 101–102 (2008)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. IN: Proceedings of ICLR Workshop 2013, pp. 1–12 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Proceedings of NIPS 2013, pp. 1–9 (2013)
Google Scholar
Mikolov, T., tau Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL 2013, pp. 746–751 (2013)
Google Scholar
National Institute for Japanese Language: Linguistics: Word List by Semantic Principles. Shuuei Shuppan (1964) (in Japanese)
Google Scholar
Nishio, M., Iwabuchi, E., Mizutani, S.: Iwanami Kokugo Jiten Dai Go Han. Iwanami Publisher (1994) (in Japanese)
Google Scholar
Okumura, M., Shirai, K., Komiya, K., Yokono, H.: Semeval-2010 task: Japanese WSD. In: Proceedings of the SemEval-2010, ACL 2010, pp. 69–74 (2010)
Google Scholar
Postma, M., Izquierdo, R., Agirre, E., Rigau, G., Vossen, P.: Addressing the MFS bias in WSD systems. In: Proceedings of the 10th Language Resources and Evaluation Conference, LREC 2016, pp. 1695–1700 (2016)
Google Scholar
Rothe, S., Schutze, H.: Autoextend: Extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL 2015, pp. 1793–1803 (2015)
Google Scholar
Shinnou, H., Onodera, Y., Sasaki, M., Komiya, K.: Active learning to remove source instances for domain adaptation for word sense disambiguation. In: Proceedings of PACLING-2015, pp. 224–231 (2015)
Google Scholar
Shinnou, H., Sasaki, M., Komiya, K.: Learning under covariate shift for domain adaptation for word sense disambiguation. In: Proceedings of PACLIC-29, pp. 215–223 (2015)
Google Scholar
Sugawara, H., Takamura, H., Sasano, R., Okumura, M.: Context representation with word embeddings for WSD. In: Proceedings of PACLING 2015 (2015)
Google Scholar
Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of AAAI-16, pp. 2058–2065 (2016)
Google Scholar
Taghipour, K., Ng, H.T.: Semi-supervised word sense disambiguation using word embeddings in general and specific domains. In: Proceedings of NAACL-HLT 2015, pp. 314–323 (2015)
Google Scholar
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of ACL 2014, pp. 1555–1565 (2014)
Google Scholar
Vu, T., Parker, D.S.: K-embeddings: Learning conceptual embeddings for words using context. In: Proceedings of NAACL-HLT 2016, pp. 1262–1267 (2016)
Google Scholar

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number 15K16046.

Author information

Authors and Affiliations

Ibaraki University, 4-12-1 Nakanarusawa, Hitachi-shi, Ibaraki, 316-8511, Japan
Kanako Komiya, Shota Suzuki, Minoru Sasaki & Hiroyuki Shinnou
Tokyo Institute of Technology, 4259 Nagatuta, Midori-ku, Yokohama, 226-8503, Japan
Manabu Okumura

Authors

Kanako Komiya
View author publications
You can also search for this author in PubMed Google Scholar
Shota Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Minoru Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Shinnou
View author publications
You can also search for this author in PubMed Google Scholar
Manabu Okumura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanako Komiya .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Komiya, K., Suzuki, S., Sasaki, M., Shinnou, H., Okumura, M. (2018). Domain Adaptation for Word Sense Disambiguation Using Word Embeddings. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-77113-7_16
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics