Abstract
Supervised Word Sense Disambiguation (WSD) has been one of the popular NLP topics, while how to utilize the limited volume of the sense-tagged data and interpret a diversity of contexts as relevant features remains a challenging research question. This paper focuses the problem and proposes a method for effectively leveraging a variety of contexts into a neural-based WSD model. Our model is Transformer-XL framework which is coupled with Graph Convolutional Network (GCNs). GCNs integrates different features from local contexts, i.e., full dependency structures, words with part-of-speech (POS), word order information into a model. By using hidden states obtained by GCNs, Transformer-XL learns local and global contexts simultaneously, where the global context is obtained from a document appearing with the target words. The experimental results by using a series of benchmark WSD datasets show that our method is comparable to the state-of-the-art WSD methods which utilize only the limited number of sense-tagged data, especially we verified that dependency structure and POS features contribute to performance improvement in our model through an ablation test.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We used 39 dependency labels provided by the Stanford CoreNLP syntactic parser for the first two types of flows, two types of word order, and self-loops which would result in having 81 (39 \(\times \) 2 + 2 + 1) different matrices in every layer.
- 2.
References
AI-Rfou, R., Choe, D., Constant, N., Guo, M., Jones, L.: Character-level language modeling with deeper self-attention. In: Proceedings of the Advancement of Artificial Intelligence, pp. 3159–3166 (2019)
Baevski, A., Auli, M.: Adaptive input representations for neural language modeling. In: Proceedings of 7th International Conference on Learning Representations (2019)
Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., Sima’an, K.: Graph convolutional networks for text classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1957–1967 (2017)
Bevilacqua, M., Navigli, R.: Braking through the 80% glass ceiling; raising the state of the art in word sense disambiguation by incorporating knowledge graph information. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2854–2864 (2020)
Blevins, T., Zettlemoyer, L.: Moving down the long tail of word sense disambiguation with gloss informed bi-encoders. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1006–1017 (2020)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of 30th Conference on Neural Information Processing Systems, pp. 2978–2988 (2019)
Hadiwinoto, C., Ng, H.T., Gan, W.C.: Improved word sense disambiguation using pre-trained contextualized word Representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5300–5309 (2019)
Iacobacci, I., Pilehvar, M.T., Navigli, R.: Embeddings for word sense disambiguation: an evaluation study. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 897–907 (2016)
Ide, N., Véronis, J.: Introduction to the special issue on word sense disambiguation: the state of the art. J. Assoc. Comput. Linguist. 24(1), 1–40 (1998)
Kipf, T.N., Welling, M.: SEMI-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations (2017)
Levine, Y., Lenz, B., Dagan, O., Ram, O., et al.: SenseBERT: driving some sense into BERT. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4656–4667 (2020)
Li, Q., han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence, pp. 3538–3545 (2018)
Luo, F., Liu, T., Xia, Q., Chang, B., Sui, Z.: Incorporating glosses into neural word sense disambiguation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2473–2482 (2018)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford core NLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Marcheggiani, D., Titov, I.: Encoding sentences with graph convolutional networks for semantic role labeling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1506–1515 (2017)
Melamud, O., Goldberger, J., Dagan, I.: Context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)
Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. In: arXiv preprint arXiv:1609.07843 (2016)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language PRocessing and the 9th International Joint Conference on Natural Language PRocessing, pp. 1532–1543 (2014)
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2227–2237 (2018)
Raganato, A., Bovi, C.D., Navigli, R.: Neural sequence learning models for word sense disambiguation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1156–1167 (2017)
Raganato, A., Camacho-Collados, J., Navigli, R.: Word sense disambiguation; A unified evaluation framework and empirical comparison. In: Proceedings of the 15th European Chapters of the Association for Computational Linguistics, pp. 99–110 (2017)
Schlichtkrull, M., Kipf, T.N., Bloem, P., Berg, R.V.D., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Proceedings of European Semantic Web Conference, pp. 593–607 (2018)
Vashishth, S., Bhandari, M., Yadav, P., Rai, P., Bhattacharyya, C., Talukdar, P.: Incorporating syntactic and semantic information in word embeddings using graph convolutional networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3308–3318 (2019)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the NIPS, pp. 6000–6010 (2017)
Xu, Y., Yang, J.: Look again at the syntax: Relational graph convolutional network for gendered ambiguous pronoun resolution. In: Proceedings of the 1st Workshop on Gender Bias in Natural Language Processing, pp. 99–104 (2019)
Yarowsky, D.: One sense per collocation. In: Proceedings of ARPA Human Language Processing Technology Workshop, pp. 266–271 (1993)
Zhong, Z., Ng, H.T.: It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL 2010 System Demonstrations, pp. 78–83 (2010)
Acknowledgements
We are grateful to the anonymous reviewers for their comments and suggestions. This work was supported by the Grant-in-aid for JSPS, Grant Number 21K12026, and JKA through its promotion funds from KEIRIN RACE.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fukumoto, F., Mishima, T., Li, J., Suzuki, Y. (2021). Neural Local and Global Contexts Learning for Word Sense Disambiguation. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-92273-3_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92272-6
Online ISBN: 978-3-030-92273-3
eBook Packages: Computer ScienceComputer Science (R0)