Skip to main content

A Better Multiway Attention Framework forĀ Fine-Tuning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1516))

Included in the following conference series:

  • 2256 Accesses

Abstract

Powerful pre-training models have been paid widespread attention. However, little attention has been devoted to solve downstream natural language understanding (NLU) tasks in fine-tuning stage. In this paper, we propose a novel architecture named multiway attention framework (MA) in fine-tuning stage. Which utilizes a concatenated feature of the first and the last BERT-style model (e.g., BERT, ALBERT) layers, and a mean-pooling feature of the last BERT-style model layer as input. Then it applies four various attention mechanisms on the input features to learn a sentence embedding in phrase-level and semantic-level. Moreover, it aggregates the output of multiway attention, and sends this result to self-attention to learn the best combination scheme of multiway attention for target task. Experimental results on GLUE, SQuAD and RACE benchmark datasets show that our approach can obtain significant performance improvement.

Supported by Ping An Technology (Shenzhen) Co., Ltd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018)

    Google ScholarĀ 

  2. Yang, Z., Dai, Z., et al.: XLNet: generalized autoregressive pretraining for language understanding. In: NeurIPS (2019)

    Google ScholarĀ 

  3. Liu, Y., Ott, M., et al.: Roberta: a robustly optimized BERT pretraining approach (2019). arXiv preprint arXiv:1907.11692

  4. Lan, Z., Chen, M., et al.: Albert: a lite bert for self-supervised learning of language representations (2019)

    Google ScholarĀ 

  5. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: EMNLP-IJCNLP, pp. 2982ā€“3992 (2019)

    Google ScholarĀ 

  6. Jawahar, G., Sagot, B., Seddah, D.: What does BERT learn about the structure of language? In: ACL, pp. 3651ā€“3657 (2019)

    Google ScholarĀ 

  7. Vaswani, A., Shazeer, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998ā€“6008 (2017)

    Google ScholarĀ 

  8. Tan, C., Wei, F., et al.: Multiway attention networks for modeling sentence pairs. In: IJCAI, pp. 4411ā€“4417 (2018)

    Google ScholarĀ 

  9. Wang, A., Singh, A., et al.: Glue: a multi-task benchmark and analysis platform for natural language understanding (2018). arXiv preprint arXiv:1804.07461

  10. Rajpurkar, P., Jia, R., Liang, P.: Know what you donā€™t know: unanswerable questions for SQuAD. In: ACL, pp. 784ā€“789 (2018)

    Google ScholarĀ 

  11. Lai, G., Xie, Q., et al.: RACE: large-scale reading comprehension dataset from examinations. In: EMNLP, pp. 785ā€“794 (2017)

    Google ScholarĀ 

  12. Liu, X., He, P., Chen, W., Gao, J.: Multi-task deep neural networks for natural language understanding. In: ACL (2019)

    Google ScholarĀ 

  13. Sun, Y., Wang, S., Li, Y.K., Feng, S.: ERNIE 2.0: a continual pre-training framework for language understanding. AAAI 34(05), 8968ā€“8975 (2019)

    Google ScholarĀ 

  14. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194ā€“206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16

    ChapterĀ  Google ScholarĀ 

  15. Xu, H., Zhang, L., et al.: Curriculum learning for natural language understanding. In: ACL, pp. 6095ā€“6104 (2020)

    Google ScholarĀ 

  16. Peinelt, N., Nguyen, D., Liakata, M.: tBERT: topic models and BERT joining forces for semantic similarity detection. In: ACL, pp. 7047ā€“7055 (2020)

    Google ScholarĀ 

  17. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP, pp. 1412ā€“1421 (2015)

    Google ScholarĀ 

  18. Kim, J.-H., Jun, J., Zhang, B.T.: Bilinear attention networks. In: NIPS (2018)

    Google ScholarĀ 

  19. Gong, L., He, D., et al.: Efficient training of BERT by progressively stacking, pp. 2337ā€“2346 (2019)

    Google ScholarĀ 

  20. Ta-Chun, S., Cheng, H.-C.: Attention for anywhere. DSAA, SesameBERT (2020)

    Google ScholarĀ 

  21. Arase, Y., Tsujii, J.: Transfer fine-tuning of BERT with phrasal paraphrases. Comput. Speech Lang. 66, 101164 (2021)

    ArticleĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaifeng Hao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hao, K., Li, J., Hou, C., Li, P. (2021). A Better Multiway Attention Framework forĀ Fine-Tuning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92307-5_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92306-8

  • Online ISBN: 978-3-030-92307-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics