Skip to main content

AABC:ALBERT-BiLSTM-CRF Combining with Adapters

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12816))

  • 1862 Accesses

Abstract

Pre-training models (PTMs) are language models pre-trained on a large corpus, which can learn general language representations through training tasks within the model. PTMs complete various NLP tasks by connecting with downstream models. PTMs can avoid building new models from the beginning. Therefore, they are widely used in the NLP field. In order to achieve the goal of completing multiple tasks with merely one model, the scale of PTM has been rising. However, the larger scale brings a larger amount of parameters, which also means more parameters to be adjusted in the future. Therefore, this article proposes a new AABC (Adapters-ALBERT-BiLSTM-CRF) model which is introduced Adapters on ALBERT. The Adapters remain unchanged during the pre-training phase and only adjusting the Adapters module can achieve the best effect during the fine-tuning. In order to verify the reduction of the model’s adjustment parameters, AABC experimented on three tasks: named entity recognition (NER), sentiment analysis (SA), and natural language inference (NLI). Tests results show that the AABC model performs better than the BERT model on classification tasks with types less than five. Also, AABC is better than the rival model in terms of tuning parameters. On the 7 datasets of SA and NLI tasks, the average parameter counts of AABC is only 2.8% of BERT-BiLSTM-CRF. Experimental results demonstrate that the proposed method a potential classification model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp: 5998–6008 (2017)

    Google Scholar 

  2. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)

    Google Scholar 

  3. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  4. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Conference of the Association for Computational Linguistics, pp: 8440–8451 (2020)

    Google Scholar 

  5. Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, pp. 2790–2799 (2019)

    Google Scholar 

  6. Peters, M.E., Ruder, S., Smith, N.A.: To tune or not to tune? adapting pretrained representations to diverse tasks. In: Proceedings of the 4th Workshop on Representation Learning for NLP, pp: 7–14 (2019)

    Google Scholar 

  7. Pfeiffer, J., Kamath, A., Ruckle, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247 (2020)

  8. Pfeiffer, J., Vulic, I., Gurevych, I., Ruder, S.: MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 7654–7673 (2020)

    Google Scholar 

  9. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  10. Rebuffifi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp. 506–516 (2017)

    Google Scholar 

  11. Bapna, A., Firat, O.: Simple, scalable adaptation for neural machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 1538–1548 (2019)

    Google Scholar 

  12. Wang, R., et al.: K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808 (2020)

  13. Ustun, A., Bisazza, A., Bouma, G., Noord, G.: UDapter: language adaptation for truly universal dependency parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 2302–2315 (2020)

    Google Scholar 

  14. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: Generalized autoregressive pretraining for language understanding. In: NeurIPS, pp. 5754–5764 (2019)

    Google Scholar 

  15. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pre-training approach. arXiv preprint arXiv:1907.11692 (2019)

  16. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Net. 5(2), 157–166 (1994)

    Google Scholar 

  17. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  18. Graves A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: IEEE International Joint Conference on Neural Networks, pp. 2047–2052 (2005)

    Google Scholar 

  19. Chieu, H.L., Ng, H.T.: Named entity recognition with a maximum entropy approach. In: Proceedings of the Seventh Conference on Natural Language Learning, pp. 160–163 (2003)

    Google Scholar 

  20. Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Empirical Methods in Natural Language Processing, pp. 548–554 (2015)

    Google Scholar 

  21. Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 1554–1564 (2018)

    Google Scholar 

  22. Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short term memory neural networks for Chinese word segmentation. In: Empirical Methods in Natural Language Processing, pp. 1197–1206 (2015)

    Google Scholar 

  23. Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 548–554 (2015)

    Google Scholar 

  24. Zhu, Y., et al.: Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)

    Google Scholar 

  25. English Wiki. https://www.enwiki.org/w/Main_Page. Accessed 1 Jun 2021

  26. Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp: 66–71 (2018)

    Google Scholar 

  27. Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: Improving pre-training by representing and predicting spans. arXiv preprint arXiv: 1907.10529 (2019)

  28. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61373052), the 13th five year plan science and technology project of Jilin Provincial Department of Education (JJKH20200995KJ) and the Natural Science Foundation of Jilin Province (20200201447JC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to YongGang Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Chen, Z., Niu, J., Zhang, Y. (2021). AABC:ALBERT-BiLSTM-CRF Combining with Adapters. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, SY. (eds) Knowledge Science, Engineering and Management . KSEM 2021. Lecture Notes in Computer Science(), vol 12816. Springer, Cham. https://doi.org/10.1007/978-3-030-82147-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-82147-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-82146-3

  • Online ISBN: 978-3-030-82147-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics