Abstract
Pre-training models (PTMs) are language models pre-trained on a large corpus, which can learn general language representations through training tasks within the model. PTMs complete various NLP tasks by connecting with downstream models. PTMs can avoid building new models from the beginning. Therefore, they are widely used in the NLP field. In order to achieve the goal of completing multiple tasks with merely one model, the scale of PTM has been rising. However, the larger scale brings a larger amount of parameters, which also means more parameters to be adjusted in the future. Therefore, this article proposes a new AABC (Adapters-ALBERT-BiLSTM-CRF) model which is introduced Adapters on ALBERT. The Adapters remain unchanged during the pre-training phase and only adjusting the Adapters module can achieve the best effect during the fine-tuning. In order to verify the reduction of the model’s adjustment parameters, AABC experimented on three tasks: named entity recognition (NER), sentiment analysis (SA), and natural language inference (NLI). Tests results show that the AABC model performs better than the BERT model on classification tasks with types less than five. Also, AABC is better than the rival model in terms of tuning parameters. On the 7 datasets of SA and NLI tasks, the average parameter counts of AABC is only 2.8% of BERT-BiLSTM-CRF. Experimental results demonstrate that the proposed method a potential classification model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp: 5998–6008 (2017)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Conference of the Association for Computational Linguistics, pp: 8440–8451 (2020)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, pp. 2790–2799 (2019)
Peters, M.E., Ruder, S., Smith, N.A.: To tune or not to tune? adapting pretrained representations to diverse tasks. In: Proceedings of the 4th Workshop on Representation Learning for NLP, pp: 7–14 (2019)
Pfeiffer, J., Kamath, A., Ruckle, A., Cho, K., Gurevych, I.: AdapterFusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247 (2020)
Pfeiffer, J., Vulic, I., Gurevych, I., Ruder, S.: MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 7654–7673 (2020)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
Rebuffifi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp. 506–516 (2017)
Bapna, A., Firat, O.: Simple, scalable adaptation for neural machine translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 1538–1548 (2019)
Wang, R., et al.: K-adapter: Infusing knowledge into pre-trained models with adapters. arXiv preprint arXiv:2002.01808 (2020)
Ustun, A., Bisazza, A., Bouma, G., Noord, G.: UDapter: language adaptation for truly universal dependency parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 2302–2315 (2020)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: Generalized autoregressive pretraining for language understanding. In: NeurIPS, pp. 5754–5764 (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pre-training approach. arXiv preprint arXiv:1907.11692 (2019)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Net. 5(2), 157–166 (1994)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Graves A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: IEEE International Joint Conference on Neural Networks, pp. 2047–2052 (2005)
Chieu, H.L., Ng, H.T.: Named entity recognition with a maximum entropy approach. In: Proceedings of the Seventh Conference on Natural Language Learning, pp. 160–163 (2003)
Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Empirical Methods in Natural Language Processing, pp. 548–554 (2015)
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 1554–1564 (2018)
Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short term memory neural networks for Chinese word segmentation. In: Empirical Methods in Natural Language Processing, pp. 1197–1206 (2015)
Peng, N., Dredze, M.: Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 548–554 (2015)
Zhu, Y., et al.: Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27 (2015)
English Wiki. https://www.enwiki.org/w/Main_Page. Accessed 1 Jun 2021
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp: 66–71 (2018)
Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: Improving pre-training by representing and predicting spans. arXiv preprint arXiv: 1907.10529 (2019)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (61373052), the 13th five year plan science and technology project of Jilin Provincial Department of Education (JJKH20200995KJ) and the Natural Science Foundation of Jilin Province (20200201447JC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, J., Chen, Z., Niu, J., Zhang, Y. (2021). AABC:ALBERT-BiLSTM-CRF Combining with Adapters. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, SY. (eds) Knowledge Science, Engineering and Management . KSEM 2021. Lecture Notes in Computer Science(), vol 12816. Springer, Cham. https://doi.org/10.1007/978-3-030-82147-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-82147-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82146-3
Online ISBN: 978-3-030-82147-0
eBook Packages: Computer ScienceComputer Science (R0)