A Sequence-to-Sequence Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

Jiang, Wei; Wang, Yuan; Tang, Yan

doi:10.1007/978-981-15-2767-8_47

A Sequence-to-Sequence Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

Wei Jiang⁸,
Yuan Wang⁸ &
Yan Tang⁸

Conference paper
First Online: 26 January 2020

1415 Accesses
3 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1163))

Abstract

The prevalent approaches for the task of Chinese word segmentation almost rely on the Bi-LSTM neural network. However, the methods based the Bi-LSTM have an inherent drawback: the Vanishing Gradients, which cause the little efficient in capturing the faraway character information of a long sentence for the task of word segmentation. In this work, we propose a novel sequence-to-sequence transformer model for Chinese word segmentation, which is premised a type of convolutional neural network named temporal convolutional network. The model uses the temporal convolutional network to construct an encoder, and uses a fully-connected neural network to build a decoder, and applies the Viterbi algorithm to build an inference layer to infer the final result of the Chinese word segmentation. Meanwhile, the model captures the faraway character information of a long sentence by adding the layers of the encoder. For achieving a superior result of word segmentation, the model binds the Conditional Random Fields model to train parameters. The experiments on the Chinese corpus show that the performance of Chinese word segmentation of the model is better than the Bi-LSTM model, and the model has a better ability to process a long sentence than the Bi-LSTM.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/wlin12/wang2vec.

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016). Version 1
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:10803.01271 (2018). Version 2
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Google Scholar
Cai, D., Zhao, H.: Neural word segmentation learning for Chinese. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 409–420. Association for Computational Linguistics (2016)
Google Scholar
Chen, X., Qiu, X., Zhu, C., Liu, P., Huang, X.: Long short-term memory neural networks for Chinese word segmentation. In: EMNLP, pp. 1385–1394. Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/D15-1141
Chen, X., Shi, Z., Qiu, X., Huang, X.: Adversarial multi-criteria learning for Chinese word segmentation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1193–1203. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1110, https://doi.org/10.18653/v1/P17-111
Collins, M.: The forward-backward algorithm. http://www.cs.columbia.edu/~mcollins/fb.pdf
Gehring, J., Auli, M., Grangier, D., Dauphin, Y.N.: A convolutional encoder model for neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 123–135. Association for Computational Linguistics (2017a). https://doi.org/10.18653/v1/P17-1012
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML, pp. 1243–1252 (2017b)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Google Scholar
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2741–2749. Association for the Advancement of Artificial Intelligence (2014)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014). Version 9
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
Google Scholar
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 156–165 (2017)
Google Scholar
Ling, W., Dyer, C., Black, A., Trancoso, I.: Two/too simple adaptations of Word2Vec for syntax problems. In: The 2015 Annual Conference of the North American Chapter of the ACL, pp. 1299–1304. Association for Computational Linguistics (2015)
Google Scholar
Ma, J., Ganchev, K., Weiss, D.: State-of-the-art Chinese word segmentation with Bi-LSTMs. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 839–849. Association for Computational Linguistics (2018)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Google Scholar
Pei, W., Ge, T., Chan, B.: Max-margin tensor neural network for Chinese word segmentation. In: ACL, pp. 293–303. Association for Computational Linguistics (2014). http://aclweb.org/anthology/C04-1081
Qiu, X., Qian, P., Shi, Z.: Neural word segmentation learning for Chinese. In: Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Segmentation for Micro-Blog Texts. pp. 901–906. International Conference on Computer Processing of Oriental Languages (2016)
Chapter Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdino, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Xue, N.: Chinese word segmentation as character tagging. Comput. Linguist. Chin. Lang. Process. 8(1), 29–48 (2003)
MathSciNet Google Scholar
Yang, J., Zhang, Y., Dong, F.: Neural word segmentation with rich pretraining. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 839–849. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1078
Zheng, X., Chen, H., Xu, T.: Deep learning for Chinese word segmentation and POS tagging. In: EMNLP, pp. 647–657. Association for Computational Linguistics (2013). http://aclweb.org/anthology/D13-1061
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Science, Southwest University, No. 2, Tiansheng Road, Chongqing, China
Wei Jiang, Yuan Wang & Yan Tang

Authors

Wei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Tang .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Hong Shen
Sun Yat-sen University, Guangzhou, China
Yingpeng Sang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, W., Wang, Y., Tang, Y. (2020). A Sequence-to-Sequence Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation. In: Shen, H., Sang, Y. (eds) Parallel Architectures, Algorithms and Programming. PAAP 2019. Communications in Computer and Information Science, vol 1163. Springer, Singapore. https://doi.org/10.1007/978-981-15-2767-8_47

Download citation

DOI: https://doi.org/10.1007/978-981-15-2767-8_47
Published: 26 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2766-1
Online ISBN: 978-981-15-2767-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics