Skip to main content

Neural Local and Global Contexts Learning for Word Sense Disambiguation

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Abstract

Supervised Word Sense Disambiguation (WSD) has been one of the popular NLP topics, while how to utilize the limited volume of the sense-tagged data and interpret a diversity of contexts as relevant features remains a challenging research question. This paper focuses the problem and proposes a method for effectively leveraging a variety of contexts into a neural-based WSD model. Our model is Transformer-XL framework which is coupled with Graph Convolutional Network (GCNs). GCNs integrates different features from local contexts, i.e., full dependency structures, words with part-of-speech (POS), word order information into a model. By using hidden states obtained by GCNs, Transformer-XL learns local and global contexts simultaneously, where the global context is obtained from a document appearing with the target words. The experimental results by using a series of benchmark WSD datasets show that our method is comparable to the state-of-the-art WSD methods which utilize only the limited number of sense-tagged data, especially we verified that dependency structure and POS features contribute to performance improvement in our model through an ablation test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We used 39 dependency labels provided by the Stanford CoreNLP syntactic parser for the first two types of flows, two types of word order, and self-loops which would result in having 81 (39 \(\times \) 2 + 2 + 1) different matrices in every layer.

  2. 2.

    https://github.com/pfnet/optuna.

References

  1. AI-Rfou, R., Choe, D., Constant, N., Guo, M., Jones, L.: Character-level language modeling with deeper self-attention. In: Proceedings of the Advancement of Artificial Intelligence, pp. 3159–3166 (2019)

    Google Scholar 

  2. Baevski, A., Auli, M.: Adaptive input representations for neural language modeling. In: Proceedings of 7th International Conference on Learning Representations (2019)

    Google Scholar 

  3. Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., Sima’an, K.: Graph convolutional networks for text classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1957–1967 (2017)

    Google Scholar 

  4. Bevilacqua, M., Navigli, R.: Braking through the 80% glass ceiling; raising the state of the art in word sense disambiguation by incorporating knowledge graph information. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2854–2864 (2020)

    Google Scholar 

  5. Blevins, T., Zettlemoyer, L.: Moving down the long tail of word sense disambiguation with gloss informed bi-encoders. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1006–1017 (2020)

    Google Scholar 

  6. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of 30th Conference on Neural Information Processing Systems, pp. 2978–2988 (2019)

    Google Scholar 

  7. Hadiwinoto, C., Ng, H.T., Gan, W.C.: Improved word sense disambiguation using pre-trained contextualized word Representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 5300–5309 (2019)

    Google Scholar 

  8. Iacobacci, I., Pilehvar, M.T., Navigli, R.: Embeddings for word sense disambiguation: an evaluation study. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 897–907 (2016)

    Google Scholar 

  9. Ide, N., Véronis, J.: Introduction to the special issue on word sense disambiguation: the state of the art. J. Assoc. Comput. Linguist. 24(1), 1–40 (1998)

    Google Scholar 

  10. Kipf, T.N., Welling, M.: SEMI-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations (2017)

    Google Scholar 

  11. Levine, Y., Lenz, B., Dagan, O., Ram, O., et al.: SenseBERT: driving some sense into BERT. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4656–4667 (2020)

    Google Scholar 

  12. Li, Q., han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence, pp. 3538–3545 (2018)

    Google Scholar 

  13. Luo, F., Liu, T., Xia, Q., Chang, B., Sui, Z.: Incorporating glosses into neural word sense disambiguation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2473–2482 (2018)

    Google Scholar 

  14. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford core NLP natural language processing toolkit. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  15. Marcheggiani, D., Titov, I.: Encoding sentences with graph convolutional networks for semantic role labeling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1506–1515 (2017)

    Google Scholar 

  16. Melamud, O., Goldberger, J., Dagan, I.: Context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)

    Google Scholar 

  17. Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. In: arXiv preprint arXiv:1609.07843 (2016)

  18. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language PRocessing and the 9th International Joint Conference on Natural Language PRocessing, pp. 1532–1543 (2014)

    Google Scholar 

  19. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2227–2237 (2018)

    Google Scholar 

  20. Raganato, A., Bovi, C.D., Navigli, R.: Neural sequence learning models for word sense disambiguation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1156–1167 (2017)

    Google Scholar 

  21. Raganato, A., Camacho-Collados, J., Navigli, R.: Word sense disambiguation; A unified evaluation framework and empirical comparison. In: Proceedings of the 15th European Chapters of the Association for Computational Linguistics, pp. 99–110 (2017)

    Google Scholar 

  22. Schlichtkrull, M., Kipf, T.N., Bloem, P., Berg, R.V.D., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Proceedings of European Semantic Web Conference, pp. 593–607 (2018)

    Google Scholar 

  23. Vashishth, S., Bhandari, M., Yadav, P., Rai, P., Bhattacharyya, C., Talukdar, P.: Incorporating syntactic and semantic information in word embeddings using graph convolutional networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3308–3318 (2019)

    Google Scholar 

  24. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the NIPS, pp. 6000–6010 (2017)

    Google Scholar 

  25. Xu, Y., Yang, J.: Look again at the syntax: Relational graph convolutional network for gendered ambiguous pronoun resolution. In: Proceedings of the 1st Workshop on Gender Bias in Natural Language Processing, pp. 99–104 (2019)

    Google Scholar 

  26. Yarowsky, D.: One sense per collocation. In: Proceedings of ARPA Human Language Processing Technology Workshop, pp. 266–271 (1993)

    Google Scholar 

  27. Zhong, Z., Ng, H.T.: It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL 2010 System Demonstrations, pp. 78–83 (2010)

    Google Scholar 

Download references

Acknowledgements

We are grateful to the anonymous reviewers for their comments and suggestions. This work was supported by the Grant-in-aid for JSPS, Grant Number 21K12026, and JKA through its promotion funds from KEIRIN RACE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fumiyo Fukumoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fukumoto, F., Mishima, T., Li, J., Suzuki, Y. (2021). Neural Local and Global Contexts Learning for Word Sense Disambiguation. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92273-3_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92272-6

  • Online ISBN: 978-3-030-92273-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics