Skip to main content

Reinforcement of BERT with Dependency-Parsing Based Attention Mask

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2022)

Abstract

Dot-Product based attention mechanism is among recent attention mechanisms. It showed an outstanding performance with BERT. In this paper, we propose a dependency-parsing mask to reinforce the padding mask, at the multi-head attention units. Padding mask, is already used to filter padding positions. The proposed mask, aims to improve BERT attention filter. The conducted experiments, show that BERT performs better with the proposed mask.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Graves, A.: Long Short Term Memory. Springer, Cham (2012). https://doi.org/10.1007/978-3-642-24797-2-4

    Book  Google Scholar 

  2. Sepp H., Jürgen S.: Long short-term memory. Neural Comput. 9(8), 1735–1780. PMID 9377276. S2CID 1915014 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

  3. Sak, H., Senior, A., Beaufays, F.: Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling (2014)

    Google Scholar 

  4. Vaswani, A., et al.: Attention Is All You Need (2017)

    Google Scholar 

  5. Luong, M.-T.: Effective approaches to attention-based neural machine translation (2015)

    Google Scholar 

  6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT pre-training of deep bidirectional transformers for language understanding (2018)

    Google Scholar 

  7. Honnibal, M., Montani, I.: spaCy 2 natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. (2017)

    Google Scholar 

  8. Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. Accessed 27 Nov 2019

    Google Scholar 

  9. Clark, K., Khandelwal, U.M, Levy, O., Manning, C.: What Does BERT Look at? An Analysis of BERT’s attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (2019)

    Google Scholar 

  10. Peters, M., et al.: Deep contextualized word representations (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toufik Mechouma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mechouma, T., Biskri, I., Meunier, J.G. (2022). Reinforcement of BERT with Dependency-Parsing Based Attention Mask. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16210-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16209-1

  • Online ISBN: 978-3-031-16210-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics