Abstract
Named entity recognition (NER) is an important task in natural language processing and has been widely studied. In recent years, end-to-end NER with bidirectional long short-term memory (BiLSTM) has received more and more attention. However, it remains a major challenge for BiLSTM to parallel computing, long-range dependencies and single feature space mapping. We propose a deep neural network model which is based on parallel computing self-attention mechanism to address these problems. We only use a small number of BiLSTMs to capture the time series of texts and then make use of self-attention mechanism that allows parallel computing to capture long-range dependencies. Experiments on two NER datasets show that our model is superior in quality and takes less training time. Our model achieves an F1 score of 92.63% on the SIGHAN bakeoff 2006 MSRA portion for Chinese NER, improving over the existing best results by over 1.4%. On the CoNLL2003 shared task portion for English NER, our model achieves an F1 score of 92.17%, which outperforms the previous state-of-the-art results by 0.91%.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The toolkit is developed by Tsinghua University natural language processing laboratory. Refer to http://thulac.thunlp.org/ for more details.
References
Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733
Dernoncourt F, Lee JY, Szolovits P (2017) Neuroner: an easy-to-use program for named-entity recognition based on neural networks. arXiv preprint arXiv:1705.05487
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dong C, Zhang J, Zong C, Hattori M, Di H (2016) Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Natural Language Understanding and Intelligent Applications. Springer, pp 239–250
Gu L, Han Y, Wang C, Chen W, Jun J, Yuan X (2018) Module overlapping structure detection in PPI using an improved link similarity-based Markov clustering algorithm. Neural Comput Appl 1:1–10
Habibi M, Weber L, Neves M, Wiegandt DL, Leser U (2017) Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics 33(14):i37
Kim Y, Jernite Y, Sontag D, Rush AM (2016) Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, pp 2741–2749
Kuru O, Can OA, Yuret D (2016) Charner: character-level named entity recognition. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. pp 911–921
Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360
Lei Ba J, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130
Lu Y, Zhang Y, Ji D (2016) Multi-prototype Chinese character embedding. In: Proceedings of the tenth international conference on language resources and evaluation, pp 855–859
Luo G, Huang X, Lin CY, Nie Z (2016) Joint entity recognition and disambiguation. In: Conference on Empirical Methods in Natural Language Processing. pp 879–888
Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). vol 1, pp 1064–1074
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp 2204–2212
Rao H, Shi X, Rodrigue AK, Feng J, Xia Y, Elhoseny M, Yuan X, Gu L (2019) Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput 74(1):634–642
Ratinov L, Roth D (2009) Conll ’09 design challenges and misconceptions in named entity recognition. In: CoNLL ’09: Proceedings of the Thirteenth Conference on Computational Natural Language Learning. pp 147–155
Santos CD, Zadrozny B (2014) Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14). pp 1818–1826
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2017) Disan: directional self-attention network for RNN/CNN-free language understanding. arXiv preprint arXiv:1709.04696
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Stephen H, Du BK, Johnson BR (2012) The homemade alternative: Teaching human neurophysiology with instrumentation made (almost) from scratch. J Undergrad Neurosci Educ 11(1):A161–A168
Stevenson S, Carreras X (2009) Proceedings of the thirteenth conference on computational natural language learning: shared task. In: Thirteenth Conference on Computational Natural Language Learning: Shared Task
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, pp 5998–6008
Wang C, Chen W, Xu B (2017) Named entity recognition with gated convolutional neural networks. In: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, pp 110–121
Yadav V, Sharp R, Bethard S (2018) Deep affix features improve neural named entity recognizers. In: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. pp 167–172
Yang YS, Zhang M, Chen W, Zhang W, Wang H, Zhang M (2018) Adversarial learning for Chinese NER from crowd annotations. arXiv preprint arXiv:1801.05147
Yang Z, Salakhutdinov R, Cohen WW (2017) Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint arXiv:1703.06345
Yuan X, Xie L, Abouelenien M (2002) A regularized ensemble framework of deep learning for cancer detection from multiclass, imbalanced training data. Pattern Recognit 77:160–172
Yuan X, Buckles BP, Yuan Z, Zhang J (2018) Mining negative association rules. In: Proceedings ISCC 2002 Seventh International Symposium on Computers and Communications. pp 623–628
Zhou J, Qu W, Zhang F (2013) Chinese named entity recognition via joint identification and categorization. Chin J Electron 22(2):225–230
Zhou J, Xu W (2015) End-to-end learning of semantic role labeling using recurrent neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). vol 1, pp 1127–1137
Acknowledgements
We thank the reviewers for their thoughtful comments and suggestions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 31771679, 31371533, 31671589), the Special Fund for Key Program of Science and Technology of Anhui Province of China (Grant Nos. 16030701092, kJ2016A836, 18030901034), the Key Laboratory of Agricultural Electronic Commerce (Grant Nos. AEC2018003, AEC2018006) and Hefei Major Research Project of Key Technology (Grant No. J2018G14).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, X., Yang, N., Jiang, Y. et al. A parallel computing-based Deep Attention model for named entity recognition. J Supercomput 76, 814–830 (2020). https://doi.org/10.1007/s11227-019-02985-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02985-5