Skip to main content

A New Fine-Tuning Architecture Based on Bert for Word Relation Extraction

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

We introduce a new attention-based neural architecture to fine-tune Bidirectional Encoder Representations from Transformers (BERT) for semantic and grammatical relationship classificaiton at word level. BERT has been widely accepted as a base to create the state-of-the-art models for sentence-level and token-level natural language processing tasks via a fine tuning process, which typically takes the final hidden states as input for a classification layer. Inspired by the Residual Net, we propose in this paper a new architecture that augments the final hidden states with multi-head attention weights from all Transformer layers for fine-tuning. We explain the rationality of this proposal in theory and compare it with recent models for word-level relation tasks such as dependency tree parsing. The resulting model shows evident improvement comparing to the standard BERT fine-tuning model on the dependency parsing task with the English TreeBank data and the semantic relation extraction task of SemEval-2010Task-8.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. In: EMNLP, pp. 740–750, January 2014

    Google Scholar 

  2. Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. CoRR abs/1511.01432 (2015)

    Google Scholar 

  3. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)

    Google Scholar 

  4. Dozat, T., Manning, C.D.: Deep biaffine attention for neural dependency parsing. CoRR abs/1611.01734 (2016)

    Google Scholar 

  5. Hashimoto, K., Xiong, C., Tsuruoka, Y., Socher, R.: A joint many-task model: Growing a neural network for multiple NLP tasks. CoRR abs/1611.01587 (2016)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)

    Google Scholar 

  7. Hendrickx, I., Kim, S., Kozareva, Z., Nakov, P., Padó, S., Pennacchiotti, M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals, pp. 33–38, January 01 2010

    Google Scholar 

  8. Howard, J., Ruder, S.: Fine-tuned language models for text classification. CoRR abs/1801.06146 (2018)

    Google Scholar 

  9. Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. CoRR abs/1603.04351 (2016)

    Google Scholar 

  10. de Marnee, M.C., Manning, C.: Stanford typed dependencies manual, January 2008

    Google Scholar 

  11. McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 523–530. Association for Computational Linguistics, Stroudsburg, PA, USA (2005). https://doi.org/10.3115/1220575.1220641

  12. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. CoRR abs/1802.05365 (2018)

    Google Scholar 

  13. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  14. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. CoRR abs/1706.03762 (2017)

    Google Scholar 

  15. Wang, L., Cao, Z., de Melo, G., Liu, Z.: Relation classification via multi-level attention CNNs. pp. 1298–1307 (01 2016). 10.18653/v1/P16-1123

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fanyu Meng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meng, F., Feng, J., Yin, D., Hu, M. (2019). A New Fine-Tuning Architecture Based on Bert for Word Relation Extraction. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics