Abstract
Word segmentation and part-of-speech tagging are two preliminary but fundamental components of Chinese natural language processing. With the upsurge of deep learning, end-to-end models are built without handcrafted features. In this work, we model Chinese word segmentation and part-of-speech tagging jointly on the basis of state-of-the-art BiRNN-CRF architecture. LSTM is adopted as the basic recurrent unit. Apart from utilizing pre-trained character embeddings and trigram features, we incorporate neural language model and conduct multi-task training. Highway layers are applied to tackle the discordance issue of the naive co-training. Experimental results on CTB5, CTB7, and PPD datasets show the effectiveness of the proposed method.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Abadi, M. et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI 2016, pp. 265–283. USENIX Association, Berkeley (2016)
Chen, X., Qiu, X., Huang, X.: A long dependency aware deep architecture for joint Chinese word segmentation and POS tagging. CoRR abs/1611.05384 (2016)
Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Durme, B.V., Rastogi, P., Poliak, A., Martin, M.P.: Efficient, compositional, order-sensitive n-gram embeddings. In: EACL (2017)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, PMLR, Chia Laguna Resort, Sardinia, Italy, vol. 9, pp. 249–256, 13–15 May 2010
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Jiang, W., Huang, L., Liu, Q., Lü, Y.: A cascaded linear model for joint chinese word segmentation and part-of-speech tagging. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (2008)
Jin, G., Chen, X.: The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition and Chinese POS tagging. In: IJCNLP (2008)
Kruengkrai, C., Uchimoto, K., Kazama, J., Wang, Y., Torisawa, K., Isahara, H.: Joint Chinese word segmentation and POS tagging using an error-driven word-character hybrid model. IEICE Trans. Inf. Syst. E92(D12), 2298–2305 (2009). https://doi.org/10.1587/transinf.E92.D.2298
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Li, B., Liu, T., Zhao, Z., Wang, P., Du, X.: Neural bag-of-ngrams. In: AAAI, pp. 3067–3074 (2017)
Li, Y., Li, W., Sun, F., Li, S.: Component-enhanced Chinese character embeddings. CoRR abs/1508.06669 (2015)
Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: Compositional character models for open vocabulary word representation. CoRR abs/1508.02096 (2015)
Liu, L., Shang, J., Xu, F.F., Ren, X., Gui, H., Peng, J., Han, J.: Empower sequence labeling with task-aware neural language model. CoRR abs/1709.04109 (2017)
Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. CoRR abs/1603.01354 (2016)
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 591–598. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: one-at-a-time or all-at-once? word-based or character-based? In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, pp. 277–284. Association for Computational Linguistics, Barcelona, July 2004
Pascanu, R., Mikolov, T., Bengio, Y.: Understanding the exploding gradient problem. CoRR abs/1211.5063 (2012)
Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th International Conference on Computational Linguistics, COLING 2004. Association for Computational Linguistics, Stroudsburg (2004). https://doi.org/10.3115/1220355.1220436
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Peters, M.E., Ammar, W., Bhagavatula, C., Power, R.: Semi-supervised sequence tagging with bidirectional language models. CoRR abs/1705.00108 (2017)
Rei, M.: Semi-supervised multitask learning for sequence labeling. CoRR abs/1704.07156 (2017)
Shao, Y., Hardmeier, C., Tiedemann, J., Nivre, J.: Character-based joint segmentation and POS tagging for Chinese using bidirectional RNN-CRF. CoRR abs/1704.01314 (2017)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. CoRR abs/1505.00387 (2015)
Sun, Y., Lin, L., Tang, D., Yang, N., Ji, Z., Wang, X.: Radical-enhanced Chinese character embedding. CoRR abs/1404.4714 (2014)
Uchiumi, K., Tsukahara, H., Mochihashi, D.: Inducing word and part-of-speech with Pitman-Yor hidden semi-Markov models. In: ACL (2015)
Vajjala, S., Banerjee, S.: A study of n-gram and embedding representations for native language identification. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 240–248 (2017)
Wang, Y., Kazama, J., Tsuruoka, Y., Chen, W., Zhang, Y., Torisawa, K.: Improving Chinese word segmentation and POS tagging with semi-supervised methods using large auto-analyzed data. In: IJCNLP (2011)
Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Charagram: Embedding words and sentences via character n-grams. CoRR abs/1607.02789 (2016)
Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. CoRR abs/1703.06345 (2017)
Zhang, Y., Clark, S.: Joint word segmentation and POS tagging using a single perceptron. In: Proceedings of ACL-08: HLT, pp. 888–896. Association for Computational Linguistics, Columbus, June 2008
Zhang, Y., Clark, S.: A fast decoder for joint word segmentation and POS-tagging using a single discriminative model. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 843–852. Association for Computational Linguistics, Stroudsburg (2010)
Acknowledgement
This research work has been funded by the National Natural Science Foundation of China (Grant No.61772337, U1736207 and 61472248), the SJTU-Shanghai Songheng Content Analysis Joint Lab, and program of Shanghai Technology Research Leader (Grant No.16XD1424400).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Liu, G., Zhou, J., Zhou, C., Sun, H. (2018). LM Enhanced BiRNN-CRF for Joint Chinese Word Segmentation and POS Tagging. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2018. Lecture Notes in Computer Science(), vol 11109. Springer, Cham. https://doi.org/10.1007/978-3-319-99501-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-99501-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99500-7
Online ISBN: 978-3-319-99501-4
eBook Packages: Computer ScienceComputer Science (R0)