Abstract
In this paper, we propose domain adaptation in word sense disambiguation (WSD) using word embeddings. The validity of the word embeddings from a huge corpus, e.g., Wikipedia, for WSD had already been shown, but their validity in a domain adaptation framework has not been discussed before. In addition, if they are valid, the difference in effects according to the domain of the corpora is still unknown. Therefore, we investigate the performances of domain adaptation in WSD using the word embeddings from the source, target, and general corpora and examine (1) whether the word embeddings are valid for domain adaptation of WSD and (2) if they are, the effects in accordance with the domain of the corpora. The experiments using Japanese corpora revealed that the accuracy of WSD was highest when we used the word embeddings obtained from the target corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Sugawara et al. [31] reported that Context-Word-Embeddings improved the result of WSD more than Average-Word-Embeddings, which was the average of vector representations of words in the context window.
- 2.
- 3.
We used the -b option of libsvm.
- 4.
- 5.
- 6.
SemEval-2010 Task: Japanese WSD [26] is included in this corpus.
- 7.
- 8.
The ratios of the numbers of tokens for corpora (1)–(5) are less than two percent to that of Wikipedia respectively and the ratio of corpus (6) is about 46%.
- 9.
Note that we cannot know the most frequent sense in the target corpus without the labeled target data and it is hard to beat [27].
References
Agirre, E., de Lacalle, O.L.: On robustness and domain adaptation using svd for word sense disambiguation. In: Proceedings of COLING 2008, pp. 17–24 (2008)
Agirre, E., de Lacalle, O.L.: Supervised domain adaption for WSD. In: Proceedings of EACL 2009, pp. 42–50 (2009)
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of EMNLP 2006, pp. 120–128 (2006)
Chan, Y.S., Ng, H.T.: Estimating class priors in domain adaptation for word sense disambiguation. In: Proceedings of COLING-ACL 2006, pp. 89–96 (2006)
Chan, Y.S., Ng, H.T.: Domain adaptation with active learning for word sense disambiguation. In: Proceedings of ACL 2007, pp. 49–56 (2007)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, T., Xu, R., He, Y., Wang, X.: Improving distributed representation of word sense via wordnet gloss composition and context clustering. In: Proceedings of ACL-IJCNLP 2015, pp. 15–20 (2015)
Clinchant, S., Csurka, G., Chidlovskii, B.: A domain adaptation regularization for denoising autoencoders. In: Proceedings of ACL 2016, pp. 26–31 (2016)
Daumé III, H.: Frustratingly easy domain adaptation. In: Proceedings of ACL 2007, pp. 256–263 (2007)
Daumé III, H., Kumar, A., Saha, A.: Frustratingly easy semi-supervised domain adaptation. In: Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing, ACL 2010, pp. 23–59 (2010)
Escudero, G., rquez, L.M., Rigau, G.: An empirical study of the domain dependence of supervised word sense disambiguation systems. In: Proceedings of EMNLP/VLC 2000, pp. 172–180 (2000)
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd ICML, pp. 1180–1189 (2015)
Hashida, K., Isahara, H., Tokunaga, T., Hashimoto, M., Ogino, S., Kashino, W.: The RWC text databases. In: Proceedings of the First International Conference on Language Resource and Evaluation, pp. 457–461 (1998)
Izquierd, R., Suárez, A., Rigau, G.: Word vs. class-based word sense disambiguation. J. Artif. Intell. Res. 54, 83–122 (2015)
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of ACL 2007, pp. 264–271 (2007)
Komiya, K., Okumura, M.: Automatic determination of a domain adaptation method for word sense disambiguation using decision tree learning. In: Proceedings of IJCNLP 2011, pp. 1107–1115 (2011)
Komiya, K., Okumura, M.: Automatic domain adaptation for word sense disambiguation based on comparison of multiple classifiers. In: PACLIC 2012, pp. 77–85 (2012)
Kouno, K., Shinnou, H., Sasaki, M., Komiya, K.: Unsupervised domain adaptation for word sense disambiguation using stacked denoising autoencoder. In: Proceedings of PACLIC-29, pp. 224–231 (2015)
Kunii, S., Shinnou, H.: Combined use of topic models on unsupervised domain adaptation for word sense disambiguation. In: Proceedings of PACLIC-27, pp. 224–231 (2013)
Maekawa, K.: Balanced corpus of contemporary written japanese. In: Proceedings of the 6th Workshop on Asian Language Resources (ALR), pp. 101–102 (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. IN: Proceedings of ICLR Workshop 2013, pp. 1–12 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Proceedings of NIPS 2013, pp. 1–9 (2013)
Mikolov, T., tau Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL 2013, pp. 746–751 (2013)
National Institute for Japanese Language: Linguistics: Word List by Semantic Principles. Shuuei Shuppan (1964) (in Japanese)
Nishio, M., Iwabuchi, E., Mizutani, S.: Iwanami Kokugo Jiten Dai Go Han. Iwanami Publisher (1994) (in Japanese)
Okumura, M., Shirai, K., Komiya, K., Yokono, H.: Semeval-2010 task: Japanese WSD. In: Proceedings of the SemEval-2010, ACL 2010, pp. 69–74 (2010)
Postma, M., Izquierdo, R., Agirre, E., Rigau, G., Vossen, P.: Addressing the MFS bias in WSD systems. In: Proceedings of the 10th Language Resources and Evaluation Conference, LREC 2016, pp. 1695–1700 (2016)
Rothe, S., Schutze, H.: Autoextend: Extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL 2015, pp. 1793–1803 (2015)
Shinnou, H., Onodera, Y., Sasaki, M., Komiya, K.: Active learning to remove source instances for domain adaptation for word sense disambiguation. In: Proceedings of PACLING-2015, pp. 224–231 (2015)
Shinnou, H., Sasaki, M., Komiya, K.: Learning under covariate shift for domain adaptation for word sense disambiguation. In: Proceedings of PACLIC-29, pp. 215–223 (2015)
Sugawara, H., Takamura, H., Sasano, R., Okumura, M.: Context representation with word embeddings for WSD. In: Proceedings of PACLING 2015 (2015)
Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of AAAI-16, pp. 2058–2065 (2016)
Taghipour, K., Ng, H.T.: Semi-supervised word sense disambiguation using word embeddings in general and specific domains. In: Proceedings of NAACL-HLT 2015, pp. 314–323 (2015)
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of ACL 2014, pp. 1555–1565 (2014)
Vu, T., Parker, D.S.: K-embeddings: Learning conceptual embeddings for words using context. In: Proceedings of NAACL-HLT 2016, pp. 1262–1267 (2016)
Acknowledgment
This work was supported by JSPS KAKENHI Grant Number 15K16046.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Komiya, K., Suzuki, S., Sasaki, M., Shinnou, H., Okumura, M. (2018). Domain Adaptation for Word Sense Disambiguation Using Word Embeddings. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-77113-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)