Skip to main content

A Sequence-to-Sequence Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1163))

Abstract

The prevalent approaches for the task of Chinese word segmentation almost rely on the Bi-LSTM neural network. However, the methods based the Bi-LSTM have an inherent drawback: the Vanishing Gradients, which cause the little efficient in capturing the faraway character information of a long sentence for the task of word segmentation. In this work, we propose a novel sequence-to-sequence transformer model for Chinese word segmentation, which is premised a type of convolutional neural network named temporal convolutional network. The model uses the temporal convolutional network to construct an encoder, and uses a fully-connected neural network to build a decoder, and applies the Viterbi algorithm to build an inference layer to infer the final result of the Chinese word segmentation. Meanwhile, the model captures the faraway character information of a long sentence by adding the layers of the encoder. For achieving a superior result of word segmentation, the model binds the Conditional Random Fields model to train parameters. The experiments on the Chinese corpus show that the performance of Chinese word segmentation of the model is better than the Bi-LSTM model, and the model has a better ability to process a long sentence than the Bi-LSTM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/wlin12/wang2vec.

References

  1. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016). Version 1

  2. Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:10803.01271 (2018). Version 2

  3. Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)

    Google Scholar 

  4. Cai, D., Zhao, H.: Neural word segmentation learning for Chinese. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 409–420. Association for Computational Linguistics (2016)

    Google Scholar 

  5. Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short-term memory neural networks for Chinese word segmentation. In: EMNLP, pp. 1385–1394. Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/D15-1141

  6. Chen, X., Shi, Z., Qiu, X., Huang, X.: Adversarial multi-criteria learning for Chinese word segmentation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1193–1203. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1110, https://doi.org/10.18653/v1/P17-111

  7. Collins, M.: The forward-backward algorithm. http://www.cs.columbia.edu/~mcollins/fb.pdf

  8. Gehring, J., Auli, M., Grangier, D., Dauphin, Y.N.: A convolutional encoder model for neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 123–135. Association for Computational Linguistics (2017a). https://doi.org/10.18653/v1/P17-1012

  9. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML, pp. 1243–1252 (2017b)

    Google Scholar 

  10. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

    Google Scholar 

  11. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2741–2749. Association for the Advancement of Artificial Intelligence (2014)

    Google Scholar 

  12. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014). Version 9

  13. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)

    Google Scholar 

  14. Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 156–165 (2017)

    Google Scholar 

  15. Ling, W., Dyer, C., Black, A., Trancoso, I.: Two/too simple adaptations of Word2Vec for syntax problems. In: The 2015 Annual Conference of the North American Chapter of the ACL, pp. 1299–1304. Association for Computational Linguistics (2015)

    Google Scholar 

  16. Ma, J., Ganchev, K., Weiss, D.: State-of-the-art Chinese word segmentation with Bi-LSTMs. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 839–849. Association for Computational Linguistics (2018)

    Google Scholar 

  17. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (2010)

    Google Scholar 

  18. Pei, W., Ge, T., Chan, B.: Max-margin tensor neural network for Chinese word segmentation. In: ACL, pp. 293–303. Association for Computational Linguistics (2014). http://aclweb.org/anthology/C04-1081

  19. Qiu, X., Qian, P., Shi, Z.: Neural word segmentation learning for Chinese. In: Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Segmentation for Micro-Blog Texts. pp. 901–906. International Conference on Computer Processing of Oriental Languages (2016)

    Chapter  Google Scholar 

  20. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdino, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  21. Xue, N.: Chinese word segmentation as character tagging. Comput. Linguist. Chin. Lang. Process. 8(1), 29–48 (2003)

    MathSciNet  Google Scholar 

  22. Yang, J., Zhang, Y., Dong, F.: Neural word segmentation with rich pretraining. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 839–849. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1078

  23. Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: EMNLP, pp. 647–657. Association for Computational Linguistics (2013). http://aclweb.org/anthology/D13-1061

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, W., Wang, Y., Tang, Y. (2020). A Sequence-to-Sequence Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2767-8_47

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2766-1

  • Online ISBN: 978-981-15-2767-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics