Skip to main content
Log in

Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Semi-Markov conditional random fields (Semi-CRFs) have been successfully utilized in many segmentation problems, including Chinese word segmentation (CWS). The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences. Despite its theoretical advantage, Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentence’s length. In this paper, we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity. Specifically, we first adopt a bi-directional long short-term memory (BiLSTM) on character level to model the context information, and then use simple but effective fusion layer to represent the segment information. Besides, to model arbitrarily long segments within linear time complexity, we also propose a new model named Semi-CRF-Relay. The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings. Experiments on four popular CWS datasets show the effectiveness of our proposed methods. The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Xue N. Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(1): 29-48.

    Google Scholar 

  2. Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. the 18th International Conference on Machine Learning, June 2001, pp.282-289.

  3. Zheng X, Chen H, Xu T. Deep learning for Chinese word segmentation and POS tagging. In Proc. the 2013 Conference on Empirical Methods in Natural Language Processing, October 2013, pp.647-657.

  4. Pei W, Ge T, Chang B. Max-margin tensor neural network for Chinese word segmentation. In Proc. the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014, pp.293-303.

  5. Chen X, Qiu X, Zhu C, Liu P, Huang X. Long short-term memory neural networks for Chinese word segmentation. In Proc. the 2015 Conference on Empirical Methods in Natural Language Processing, September 2015, pp.1197-1206.

  6. Chen X, Qiu X, Zhu C, Huang X. Gated recursive neural network for Chinese word segmentation. In Proc. the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, July 2015, pp.1744-1753.

  7. Zhang Y, Clark S. Chinese segmentation with a word-based perceptron algorithm. In Proc. the 45th Annual Meeting of the Association for Computational Linguistics, June 2007, pp.840-847.

  8. Sun W. Word-based and character-based word segmentation models: Comparison and combination. In Proc. the 23rd International Conference on Computational Linguistics, August 2010, pp.1211-1219.

  9. Cai D, Zhao H. Neural word segmentation learning for Chinese. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.409-420.

  10. Zhang M, Zhang Y, Fu G. Transition-based neural word segmentation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016, pp.421-431.

  11. Liu Y, Che W, Guo J, Qin B, Liu T. Exploring segment representations for neural segmentation models. In Proc. the 25th International Joint Conference on Artificial Intelligence, July 2016, pp.2880-2886.

  12. Sarawagi S, Cohen W. Semi-Markov conditional random fields for information extraction. In Proc. the Annual Conference on Neural Information Processing Systems, December 2005, pp.1185-1192.

  13. Andrew G. A hybrid Markov/semi-Markov conditional random field for sequence segmentation. In Proc. the 2006 Conference on Empirical Methods in Natural Language Processing, July 2006, pp.465-472.

  14. Sun X, Zhang Y, Matsuzaki T, Tsuruoka Y, Tsujii J. A discriminative latent variable Chinese segmenter with hybrid word/character information. In Proc. the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, May 2009, pp.56-64.

  15. Kong L, Dyer C, Smith N A. Segmental recurrent neural networks. In Proc. the 4th International Conference on Learning Representations, May 2015.

  16. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780.

    Article  Google Scholar 

  17. Chen X, Shi Z, Qiu X, Huang X. Adversarial multi-criteria learning for Chinese word segmentation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1193-1203.

  18. Chen X, Shi Z, Qiu X, Huang X. DAG-based long short-term memory for neural word segmentation. arXiv:1707.00248, 2017. https://arxiv.org/abs/1707.00248, August 2019.

  19. Yang J, Zhang Y, Liang S. Subword encoding in Lattice LSTM for Chinese word segmentation. arXiv:1810.12594, 2018. https://arxiv.org/abs/1810.12594, August 2019.

  20. Elman J L. Finding structure in time. Cognitive Science, 1990, 14(2): 179-211.

    Article  Google Scholar 

  21. Song Y, Shi S, Li J, Zhang H. Directional skip-gram: Explicitly distinguishing left and right context for word embeddings. In Proc. the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, June 2018, pp.175-180.

  22. Emerson T. The second international Chinese word segmentation bakeoff. In Proc. the 4th SIGHAN Workshop on Chinese Language Processing, June 2005, pp.123-133.

  23. Zeiler M D. ADADELTA: An adaptive learning rate method. arXiv:1212.5701, 2012. https://arxiv.org/abs/1212.5701, August 2019.

  24. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.

    MathSciNet  MATH  Google Scholar 

  25. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. the 13th International Conference on Artificial Intelligence and Statistics, May 2010, pp.249-256.

  26. Ling W, Dyer C, Black A W, Trancoso I. Two/too simple adaptations of word2vec for syntax problems. In Proc. the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, May 2015, pp.1299-1304.

  27. Zhang Q, Liu X, Fu J. Neural networks incorporating dictionaries for Chinese word segmentation. In Proc. the 32nd AAAI Conference on Artificial Intelligence, February 2018, pp.5682-5689.

  28. Finkel J R, Manning C D. Nested named entity recognition. In Proc. the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009, pp.141-150.

  29. Ye Z, Ling Z. Hybrid semi-Markov CRF for neural sequence labeling. In Proc. the 56th Annual Meeting of the Association for Computational Linguistics, July 2018, pp.235-240.

  30. Sun X, Huang D, Song H, Ren F. Chinese new word identification: A latent discriminative model with global features. Journal of Computer Science and Technology, 2011, 26(1): 14-24.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi-Peng Qiu.

Electronic supplementary material

ESM 1

(PDF 540 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qun, N., Yan, H., Qiu, XP. et al. Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node. J. Comput. Sci. Technol. 35, 1115–1126 (2020). https://doi.org/10.1007/s11390-020-9576-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-9576-4

Keywords

Navigation