Abstract
Chinese Named Entity Recognition (NER), as one of basic natural language processing tasks, is still a tough problem due to Chinese polysemy and complexity. In recent years, most of previous works regard NER as a sequence tagging task, including statistical models and deep learning methods. In this paper, we innovatively consider NER as a sequence transformation task in which the unlabeled sequences (source texts) are converted to labeled sequences (NER labels). In order to model this sequence transformation task, we design a sequence-to-sequence neural network, which combines a Conditional Random Fields (CRF) layer to efficiently use sentence level tag information and the attention mechanism to capture the most important semantic information of the encoded sequence. In experiments, we evaluate different models both on a standard corpus consisting of news data and an unnormalized one consisting of short messages. Experimental results showed that our model outperforms the state-of-the-art methods on recognizing short interdependence entity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. 4, 3104–3112 (2014)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Computer Science (2014)
Hermann, K.M., Kočiský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., et al.: Teaching machines to read and comprehend, pp. 1693–1701 (2015)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. 26, 3111–3119 (2013)
Nallapati, R., Zhou, B., Santos, C.N.D., Gulcehre, C., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond (2016)
Collbert, R., Weston, J., Bottou, L.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition, pp. 260–270 (2016)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv:1603.01354v4 (2016)
Dong, C., Zhang, J., Zong, C., Hattori, M., Di, H.: Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC 2016. LNCS (LNAI), vol. 10102, pp. 239–250. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_20
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Paulus, R., Xiong, C., Socher, R.: A deep reinforced model for abstractive summarization. arXiv preprint arXiv:1705.04304v3 (2017)
Su, J., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Meeting on Association for Computational Linguistics, pp. 473–480. Association for Computational Linguistics (2002)
Borthwick, A.: A Maximum Entropy Approach to Named Entity Recognition. New York University (1999)
Hai, L.C., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: International Conference on Computational Linguistics, pp. 1–7. Association for Computational Linguistics (2002)
Li, L., Mao, T., Huang, D., Yang, Y.: Hybrid models for Chinese named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 72–78 (2006)
Chen, A., Peng, F., Shan, R., Sun, G.: Chinese named entity recognition with conditional probabilistic models. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 173–176 (2006)
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling (2016)
Lafferty, J.D., Mccallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Eighteenth International Conference on Machine Learning, vol. 3, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Computer Science (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Computer Science. arXiv preprint arXiv:1412.6980 (2014)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization (2014)
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.: TensorFlow: large-scale machine learning on heterogeneous distributed systems (2016)
Acknowledgement
This work was supported by the National Key Research and Development program of China (No. 2016YFB0801300), the National Natural Science Foundation of China grants (No. 61602466).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q., Song, Y., Liu, H., Cao, Y., Liu, Y., Guo, L. (2018). A Sequence Transformation Model for Chinese Named Entity Recognition. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11061. Springer, Cham. https://doi.org/10.1007/978-3-319-99365-2_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-99365-2_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99364-5
Online ISBN: 978-3-319-99365-2
eBook Packages: Computer ScienceComputer Science (R0)