Abstract
Chinese named entity recognition (NER) has been an important problem in natural language processing (NLP) field. Most existing methods mainly use traditional deep learning models which cannot fully leverage contextual dependencies that are very important for capturing the relations between words or characters for modeling. To address this problem, various language representation methods such as BERT have been proposed to learn the global context information. Although these methods can achieve good results, the large number of parameters limited the efficiency and application in real-world scenarios. To improve both of the performance and efficiency, this paper proposes an ALBERT-based Chinese NER method which uses ALBERT, a Lite version of BERT, as the pre-trained model to reduce model parameters and to improve the performance through sharing cross-layer parameters. Besides, it uses conditional random field (CRF) to capture the sentence-level correlation information between words or characters to alleviate the tagging inconsistency problems. Experimental results demonstrate that our method outperforms the comparison methods over 4.23–11.17% in terms of relative F1-measure with only 4% of BERT’s parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sang, E.F., Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. arXiv preprint arXiv:cs/0306050 (2003)
Collobert, R., Weston, J., Bottou, L.: Natural language processing almost from scratch, pp. 2493–2573 (2006)
Peters, M.E., Ammar, W., et al.: Semi-supervised sequence tagging with bidirectional language models. arXiv preprint, pp. 1756–1765 (2017)
Shao, Y., Hardmeier, C., et al.: Character-based Joint Segmentation and POS tagging for Chinese using bidirectional RNN-CRF. arXiv preprint, pp. 173–183 (2017)
Rei, M., Crichton, G.K., Pyysalo, S.: Attending to characters in neural sequence labeling models. arXiv preprint, pp. 309–318 (2016)
Knobelreiter, P., Reinbacher, C., Shekhovtsov, A.: End-to-end training of hybrid CNN-CRF models for stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2339–2348 (2017)
Morwal, S., Jahan, N., Chopra, D.: Named entity recognition using hidden Markov model (HMM). Int. J. Nat. Lang. Comput. (IJNLC) 1(4), 15–23 (2012)
Zhao, X.F., Zhao, D., Liu, Y.G.: Automatic recognition of Chinese names and genders using CRF. Microelectron. Comput. 28(10), 122–124+128 (2011)
Jiang, W., Wang, X.L., Guan, Y.: Improving sequence tagging using machine-learning techniques. In: Proceedings of 2006 International Conference on Machine Learning and Cybernetics, pp. 2636–2641. IEEE (2006)
Luo, G., Huang, X., Lin, C.Y.: Joint entity recognition and disambiguation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 879–888 (2015)
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM, pp. 2227–2237 (2018)
Peters, M.E., Neumann, M., Iyyer, M.: Deep contextualized word representations, pp. 1554–1564 (2018)
Devlin, J., Chang, M.W., Lee, K.: BERT: pre-training of deep bidirectional transformers for language understanding, pp. 4171–4186 (2018)
Lan, Z., Chen, M., Goodman, S., et al.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Collobert, R., Weston, J., Bottou, L., et al.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Yang, Z.L., Dai, Z.H., Yang, Y.M., Carbonell, J., Salakhutdinov, R., Quoc, V.L.: XLNet: generalized autoregressive pre-training for language understanding. arXiv preprint arXiv:1906.08237 (2019)
Liu, Y., Ott, M., Goyal, N., et al.: RoBERTa: a robustly optimized bert pre-training approach. arXiv preprint arXiv:1907.11692 (2019)
Lample, G., Ballesteros, M., Subramanian, S., et al.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Sun, S., Cheng, Y., Gan, Z., et al.: Patient knowledge distillation for BERT model compression. arXiv preprint arXiv:1908.09355 (2019)
Viterbi, A.J., Wolf, J.K., Zehavi, E., et al.: A pragmatic approach to trellis-coded modulation. IEEE Commun. Mag. 27(7), 11–19 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lv, H., Ning, Y., Ning, K. (2020). ALBERT-Based Chinese Named Entity Recognition. In: Yang, Y., Yu, L., Zhang, LJ. (eds) Cognitive Computing – ICCC 2020. ICCC 2020. Lecture Notes in Computer Science(), vol 12408. Springer, Cham. https://doi.org/10.1007/978-3-030-59585-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-59585-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59584-5
Online ISBN: 978-3-030-59585-2
eBook Packages: Computer ScienceComputer Science (R0)