AABC:ALBERT-BiLSTM-CRF Combining with Adapters

Wang, JiaYan; Chen, ZiAng; Niu, JuChuan; Zhang, YongGang

doi:10.1007/978-3-030-82147-0_24

JiaYan Wang^13,14,
ZiAng Chen^13,14,
JuChuan Niu^13,14 &
…
YongGang Zhang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12816))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1862 Accesses

Abstract

Pre-training models (PTMs) are language models pre-trained on a large corpus, which can learn general language representations through training tasks within the model. PTMs complete various NLP tasks by connecting with downstream models. PTMs can avoid building new models from the beginning. Therefore, they are widely used in the NLP field. In order to achieve the goal of completing multiple tasks with merely one model, the scale of PTM has been rising. However, the larger scale brings a larger amount of parameters, which also means more parameters to be adjusted in the future. Therefore, this article proposes a new AABC (Adapters-ALBERT-BiLSTM-CRF) model which is introduced Adapters on ALBERT. The Adapters remain unchanged during the pre-training phase and only adjusting the Adapters module can achieve the best effect during the fine-tuning. In order to verify the reduction of the model’s adjustment parameters, AABC experimented on three tasks: named entity recognition (NER), sentiment analysis (SA), and natural language inference (NLI). Tests results show that the AABC model performs better than the BERT model on classification tasks with types less than five. Also, AABC is better than the rival model in terms of tuning parameters. On the 7 datasets of SA and NLI tasks, the average parameter counts of AABC is only 2.8% of BERT-BiLSTM-CRF. Experimental results demonstrate that the proposed method a potential classification model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MACEDONIZER - The Macedonian Transformer Language Model

ParsBERT: Transformer-based Model for Persian Language Understanding

Article 08 October 2021

Paradigm Shift in Natural Language Processing

Article Open access 28 May 2022

References

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp: 5998–6008 (2017)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Conference of the Association for Computational Linguistics, pp: 8440–8451 (2020)
Google Scholar
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, pp. 2790–2799 (2019)
Google Scholar
Peters, M.E., Ruder, S., Smith, N.A.: To tune or not to tune? adapting pretrained representations to diverse tasks. In: Proceedings of the 4th Workshop on Representation Learning for NLP, pp: 7–14 (2019)
Google Scholar
Pfeiffer, J., Kamath, A., Ruckle, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247 (2020)
Pfeiffer, J., Vulic, I., Gurevych, I., Ruder, S.: MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 7654–7673 (2020)
Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
Google Scholar
Rebuffifi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp. 506–516 (2017)
Google Scholar
Bapna, A., Firat, O.: Simple, scalable adaptation for neural machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 1538–1548 (2019)
Google Scholar
Wang, R., et al.: K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808 (2020)
Ustun, A., Bisazza, A., Bouma, G., Noord, G.: UDapter: language adaptation for truly universal dependency parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 2302–2315 (2020)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: Generalized autoregressive pretraining for language understanding. In: NeurIPS, pp. 5754–5764 (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pre-training approach. arXiv preprint arXiv:1907.11692 (2019)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Net. 5(2), 157–166 (1994)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Graves A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: IEEE International Joint Conference on Neural Networks, pp. 2047–2052 (2005)
Google Scholar
Chieu, H.L., Ng, H.T.: Named entity recognition with a maximum entropy approach. In: Proceedings of the Seventh Conference on Natural Language Learning, pp. 160–163 (2003)
Google Scholar
Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Empirical Methods in Natural Language Processing, pp. 548–554 (2015)
Google Scholar
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 1554–1564 (2018)
Google Scholar
Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short term memory neural networks for Chinese word segmentation. In: Empirical Methods in Natural Language Processing, pp. 1197–1206 (2015)
Google Scholar
Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 548–554 (2015)
Google Scholar
Zhu, Y., et al.: Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)
Google Scholar
English Wiki. https://www.enwiki.org/w/Main_Page. Accessed 1 Jun 2021
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp: 66–71 (2018)
Google Scholar
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: Improving pre-training by representing and predicting spans. arXiv preprint arXiv: 1907.10529 (2019)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61373052), the 13th five year plan science and technology project of Jilin Provincial Department of Education (JJKH20200995KJ) and the Natural Science Foundation of Jilin Province (20200201447JC).

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, 130012, China
JiaYan Wang, ZiAng Chen, JuChuan Niu & YongGang Zhang
Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun, China
JiaYan Wang, ZiAng Chen, JuChuan Niu & YongGang Zhang

Authors

JiaYan Wang
View author publications
You can also search for this author in PubMed Google Scholar
ZiAng Chen
View author publications
You can also search for this author in PubMed Google Scholar
JuChuan Niu
View author publications
You can also search for this author in PubMed Google Scholar
YongGang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to YongGang Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Han Qiu
Ibaraki University, Hitachi, Japan
Cheng Zhang
University of Kentucky, Lexington, KY, USA
Zongming Fei
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu
Princeton University, Princeton, NJ, USA
Sun-Yuan Kung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Chen, Z., Niu, J., Zhang, Y. (2021). AABC:ALBERT-BiLSTM-CRF Combining with Adapters. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, SY. (eds) Knowledge Science, Engineering and Management . KSEM 2021. Lecture Notes in Computer Science(), vol 12816. Springer, Cham. https://doi.org/10.1007/978-3-030-82147-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-82147-0_24
Published: 07 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82146-3
Online ISBN: 978-3-030-82147-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

AABC:ALBERT-BiLSTM-CRF Combining with Adapters

Abstract

Access this chapter

Similar content being viewed by others

MACEDONIZER - The Macedonian Transformer Language Model

ParsBERT: Transformer-based Model for Persian Language Understanding

Paradigm Shift in Natural Language Processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

AABC:ALBERT-BiLSTM-CRF Combining with Adapters

Abstract

Access this chapter

Similar content being viewed by others

MACEDONIZER - The Macedonian Transformer Language Model

ParsBERT: Transformer-based Model for Persian Language Understanding

Paradigm Shift in Natural Language Processing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation