A Two-Stage Deep Neural Network for Sequence Labeling

Tan, Yongmei; Yang, Lin; Niu, Shaozhang; Zhu, Hao; Zhang, Yongheng

doi:10.1007/978-3-030-24265-7_12

Yongmei Tan¹¹,
Lin Yang¹²,
Shaozhang Niu¹¹,
Hao Zhu¹¹ &
…
Yongheng Zhang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11633))

Included in the following conference series:

International Conference on Artificial Intelligence and Security

1657 Accesses

Abstract

State-of-the-art sequence labeling systems require large amounts of task-specific knowledge in the form of handcrafted features and data pre-processing, and those systems are established on news corpus. English as second language (ESL) corpus is collected from articles written by English-learner. The corpus is full of grammatical mistakes, and then it is much more difficult to do sequence labeling. We propose a two-stage deep neural network architecture for sequence labeling, which enable the higher-layer to make use of the coarse-grained labeling information of the lower-level. We evaluate our model on three datasets for three sequence labeling tasks—Penn Treebank WSJ corpus for part-of-speech (POS) tagging, CoNLL 2003 corpus for named entity recognition (NER) and CoNLL 2013 corpus for grammatical error correction (GEC). We obtain state-of-the-art performance on three datasets—97.60% accuracy for POS tagging, 91.38% F1 for NER and 38% F1 for determiner error correction of GEC and 28.89% F1 for prepositional error correction of GEC. We also evaluate our system on ESL corpus PiGai for POS tagging and obtain 96.73% accuracy. The implementation of our network is publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization (2016)
Google Scholar
Baxter, J.: Learning internal representations. In: 8th Conference on Computational Learning Theory, pp. 311–320. ACM (1995)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Article Google Scholar
Bohnet, B., McDonald, R., Simoes, G., Andor, D., Pitler, E., Maynez, J.: Morphosyntactic tagging with a Meta-BiLSTM model over context sensitive token encodings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2642–2652 (2018)
Google Scholar
Cai, R., Zhang, X., Wang, H.: Bidirectional recurrent convolutional neural network for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 756–765 (2016)
Google Scholar
Caruana, R.A.: Multitask connectionist learning. In: Connectionist Models Summer School, pp. 372–379 (1995)
Google Scholar
Chang, C.H., Chen, C.D.: HMM-based Part-of-Speech Tagging for Chinese Corpora (1993)
Google Scholar
Chung, J., Ahn, S., Bengio, Y.: Hierarchical Multiscale Recurrent Neural Networks (2016)
Google Scholar
Cho, K., Merrienboer, B.V., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. Comput. Sci. (2014)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Google Scholar
Cooijmans, T., Ballas, N., Laurent, C., Gülçehre, Ç., Courville, A.: Recurrent Batch Normalization (2016)
Google Scholar
Enez, J.U.G., Arquez, L.S.M.: SVMTool: A general POS tagger generator based on Support Vector Machines. Report on the geology of Trinidad: Part I. of the West Indian survey (2004)
Google Scholar
Graves, A., Schmidhuber, J.: Special issue: framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18(5–6), 602–610 (2005)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 3(4), 212–223 (2012)
Google Scholar
Hochreiter, S.: Untersuchungen zu Dynamischen Neuronalen Netzen (1991)
Google Scholar
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Comput. Sci. (2015)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. Comput. Sci. (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL (2016)
Google Scholar
Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. Comput. Sci. (2015)
Google Scholar
Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Conference on Empirical Methods in Natural Language Processing, pp. 879–888 (2015)
Google Scholar
Ma, X., Hovy, E.: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (2016)
Google Scholar
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)
Google Scholar
Ng, H.T., Wu, S.M., Wu, Y.B., Hadiwinoto, C., Tetreault, J.: The CoNLL-2013 shared task on grammatical error correction. In: Proceedings of the 17th Conference on Computational Natural Language Learning, pp. 1–12 (2013)
Google Scholar
Passos, A., Kumar, V., Mccallum, A.: Lexicon infused phrase embeddings for named entity resolution. Comput. Sci. (2014)
Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations (2018)
Google Scholar
Sang, E.F.T.K.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th conference on Natural language learning at HLT-NAACL (2003)
Google Scholar
Santos, C.N.D., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: International Conference on Machine Learning, pp. 1818–1826 (2014)
Google Scholar
Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. Comput. Sci. (2015)
Google Scholar
Søgaard, A., Goldberg, Y.: Deep multi-task learning with low-level tasks supervised at lower layers (2016)
Google Scholar
Tan, Y.M., Yang, L., Hu, D.: A part-of-speech tagging algorithm for essay written by chinese english learner. J. Beijing Univ. Posts Telecommun. 40(2), 30–34 (2017)
Google Scholar
Veit, A., Wilber, M., Belongie, S.: Residual Networks Behave Like Ensembles of Relatively Shallow Networks (2016)
Google Scholar
Wen, Y., Zhang, W., Luo, R., Wang, J.: Learning text representation using recurrent convolutional neural network with highway layers (2016)
Google Scholar
Yoshimoto, I., et al.: NAIST at 2013 CoNLL grammatical error correction shared task. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pp. 26–33 (2013)
Google Scholar

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for the constructive comments.

Author information

Authors and Affiliations

School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Yongmei Tan, Shaozhang Niu, Hao Zhu & Yongheng Zhang
Beijing Sankuai Online Technology Co., Ltd., Beijing, China
Lin Yang

Authors

Yongmei Tan
View author publications
You can also search for this author in PubMed Google Scholar
Lin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shaozhang Niu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yongheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongmei Tan .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Xingming Sun
Nanjing University of Information Science and Technology, Nanjing, China
Zhaoqing Pan
Purdue University, West Lafayette, IN, USA
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, Y., Yang, L., Niu, S., Zhu, H., Zhang, Y. (2019). A Two-Stage Deep Neural Network for Sequence Labeling. In: Sun, X., Pan, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2019. Lecture Notes in Computer Science(), vol 11633. Springer, Cham. https://doi.org/10.1007/978-3-030-24265-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-24265-7_12
Published: 11 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24264-0
Online ISBN: 978-3-030-24265-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics