Abstract
Pre-trained language models have achieved great success in natural language processing. However, they are difficult to be deployed on resource-restricted devices because of the expensive computation. This paper introduces our solution to the Natural Language Processing and Chinese Computing (NLPCC) challenge of Light Pre-Training Chinese Language Model for the Natural Language Processing (http://tcci.ccf.org.cn/conference/2020/) (https://www.cluebenchmarks.com/NLPCC.html). The proposed solution uses a state-of-the-art method of BERT knowledge distillation (TinyBERT) with an advanced Chinese pre-trained language model (NEZHA) as the teacher model, which is dubbed as TinyNEZHA. In addition, we introduce some effective techniques in the fine-tuning stage to boost the performances of TinyNEZHA. In the official evaluation of NLPCC-2020 challenge, TinyNEZHA achieves a score of 77.71, ranking 1st place among all the participating teams. Compared with the BERT-base, TinyNEZHA obtains almost the same results while being 9× smaller and 8× faster on inference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
NLPCC2020. http://tcci.ccf.org.cn/conference/2020/index.php. Accessed 10 Mar 2020
Cui, Y., et al.: Pre-training with whole word masking for Chinese Bert. arXiv preprint arXiv:1906.08101 (2019)
Wu, X., Lv, S., Zang, L., Han, J., Hu, S.: Conditional BERT contextual augmentation. In: Rodrigues, J.M.F., et al. (eds.) ICCS 2019. LNCS, vol. 11539, pp. 84–95. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22747-0_7
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Sanh, V., et al.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Lan, Z., et al.: Albert: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Bose, A.J., Ling, H., Cao, Y.: Adversarial contrastive estimation. arXiv preprint arXiv:1805.03642 (2018)
Clark, K., et al.: ELECTRA: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)
Sun, Z., et al.: MobileBERT: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984 (2020)
Wei, J., et al.: NEZHA: neural contextualized representation for chinese language understanding. arXiv preprint arXiv:1909.00204 (2019)
Sun, J.: Jieba Chinese word segmentation tool, 21 January 2018–25 June 2018. https://github.com/fxsjy/jieba (2012)
Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
You, Y., et al.: Reducing BERT pre-training time from 3 days to 76 minutes. arXiv preprint arXiv:1904.00962 (2019)
Clark, K., et al.: What does BERT look at? An analysis of BERT’s attention. arXiv preprint arXiv:1906.04341 (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Madry, A., et al.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Yu, J., Wang, K., Yin, Y., Chen, C., Liu, Q. (2020). The Solution of Huawei Cloud & Noah’s Ark Lab to the NLPCC-2020 Challenge: Light Pre-Training Chinese Language Model for NLP Task. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-60457-8_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)