DAP-BERT: Differentiable Architecture Pruning of BERT

Yau, Chung-Yiu; Bai, Haoli; King, Irwin; Lyu, Michael R.

doi:10.1007/978-3-030-92185-9_30

Chung-Yiu Yau¹³,
Haoli Bai¹³,
Irwin King¹³ &
…
Michael R. Lyu¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13108))

Included in the following conference series:

International Conference on Neural Information Processing

2845 Accesses
1 Citations

Abstract

The recent development of pre-trained language models (PLMs) like BERT suffers from increasing computational and memory overhead. In this paper, we focus on automatic pruning for efficient BERT architectures on natural language understanding tasks. Specifically, we propose differentiable architecture pruning (DAP) to prune redundant attention heads and hidden dimensions in BERT, which benefits both from network pruning and neural architecture search. Meanwhile, DAP can adjust itself to deploy the pruned BERT on various edge devices with different resource constraints. Empirical results show that the \(\text {BERT}_\text {BASE}\) architecture pruned by DAP achieves \(5\times \) speed-up with only a minor performance drop. The code is available at https://github.com/OscarYau525/DAP-BERT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
To obtain the task-specific parameters, we follow the standard fine-tuning pipeline in https://huggingface.co/bert-base-uncased.
2.
Task specific model parameters available at https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT.

References

Bai, H., Hou, L., Shang, L., Jiang, X., King, I., Lyu, M.R.: Towards efficient post-training quantization of pre-trained language models. Preprint arXiv:2109.15082 (2021)
Bai, H., Wu, J., King, I., Lyu, M.: Few shot network compression via cross distillation. In: AAAI, vol. 34, pp. 3203–3210 (2020)
Google Scholar
Bai, H., et al.: BinaryBERT: pushing the limit of BERT quantization. In: ACL (2020)
Google Scholar
Bernstein, J., Wang, Y.X., Azizzadenesheli, K., Anandkumar, A.: signSGD: Compressed optimisation for non-convex problems. In: ICML (2018)
Google Scholar
Chen, D., et al.: AdaBERT: task-adaptive BERT compression with differentiable neural architecture search. In: IJCAI (2021)
Google Scholar
Chen, T., et al.: The lottery ticket hypothesis for pre-trained BERT networks. In: NeurIPS (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Dong, X., Yang, Y.: Network pruning via transformable architecture search. In: NeurIPS (2019)
Google Scholar
Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICML (2018)
Google Scholar
Gordon, M.A., Duh, K., Andrews, N.: Compressing BERT: studying the effects of weight pruning on transfer learning. In: ACL (2020)
Google Scholar
Hou, L., Huang, Z., Shang, L., Jiang, X., Chen, X., Liu, Q.: DynaBERT: dynamic BERT with adaptive width and depth. In: NeurIPS (2020)
Google Scholar
Jiao, X., et al.: TinyBERT: distilling BERT for natural language understanding. In: EMNLP (2020)
Google Scholar
Li, Y., Wang, W., Bai, H., Gong, R., Dong, X., Yu, F.: Efficient bitwidth search for practical mixed precision neural network. Preprint arXiv:2003.07577 (2020)
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: ICLR (2019)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. Preprint arXiv:1907.11692 (2019)
McCarley, J.S., Chakravarti, R., Sil, A.: Structured pruning of a BERT-based question answering model. Preprint arXiv:1910.06360 (2021)
Michel, P., Levy, O., Neubig, G.: Are sixteen heads really better than one? In: NeurIPS (2019)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: ICML (2013)
Google Scholar
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: ICML, pp. 4092–4101 (2018)
Google Scholar
Prasanna, S., Rogers, A., Rumshisky, A.: When BERT plays the lottery, all tickets are winning. In: EMNLP (2020)
Google Scholar
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: AAAI (2019)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: NeurIPS (2020)
Google Scholar
Shen, S., et al.: Q-BERT: hessian based ultra low precision quantization of BERT. In: AAAI (2019)
Google Scholar
Sun, Z., Yu, H., Song, X., Liu, R., Yang, Y., Zhou, D.: MobileBERT: a compact task-agnostic BERT for resource-limited devices. In: ACL (2020)
Google Scholar
Turc, I., Chang, M.W., Lee, K., Toutanova, K.: Well-read students learn better: on the importance of pre-training compact models. Preprint arXiv:1908.08962v2 (2019)
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: ICLR (2019)
Google Scholar
Wang, J., et al.: Revisiting parameter sharing for automatic neural channel number search. In: NeurIPS, vol. 33 (2020)
Google Scholar
Wang, J., Bai, H., Wu, J., Cheng, J.: Bayesian automatic model compression. IEEE JSTSP 14(4), 727–736 (2020)
Google Scholar
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision. In: CVPR, pp. 8612–8620 (2019)
Google Scholar
Wen, L., Zhang, X., Bai, H., Xu, Z.: Structured pruning of recurrent neural networks through neuron selection. NN 123, 134–141 (2020)
Google Scholar
Wu, J., et al.: PocketFlow: an automated framework for compressing and accelerating deep neural networks. In: NeurIPS, CDNNRIA workshop (2018)
Google Scholar
Xu, J., et al.: Nas-BERT: task-agnostic and adaptive-size BERT compression with neural architecture search. In: KDD (2021)
Google Scholar
Zhang, W., et al.: TernaryBERT: distillation-aware ultra-low bit BERT. In: EMNLP (2020)
Google Scholar
Zhao, S., Gupta, R., Song, Y., Zhou, D.: Extremely small BERT models from mixed-vocabulary training. In: EACL (2021)
Google Scholar
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: CVPR (2018)
Google Scholar

Download references

Acknowledgement

The work described in this paper was partially supported by the National Key Research and Development Program of China (No. 2018AAA0100204) and the Research Grants Council of the Hong Kong Special Administrative Region, China (No. CUHK 14210920 of the General Research Fund).

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Shatin, Hong Kong, China
Chung-Yiu Yau, Haoli Bai, Irwin King & Michael R. Lyu

Authors

Chung-Yiu Yau
View author publications
You can also search for this author in PubMed Google Scholar
Haoli Bai
View author publications
You can also search for this author in PubMed Google Scholar
Irwin King
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Lyu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chung-Yiu Yau .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yau, CY., Bai, H., King, I., Lyu, M.R. (2021). DAP-BERT: Differentiable Architecture Pruning of BERT. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-92185-9_30
Published: 06 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DAP-BERT: Differentiable Architecture Pruning of BERT