Skip to main content
Log in

Multi-task learning with helpful word selection for lexicon-enhanced Chinese NER

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is a common task in the field of natural language processing, but it remains more challenging in Chinese due to the lack of natural delimiters. Recently, lots of works incorporate external lexicon into character-level Chinese NER, which focus on how to integrate the matched words in the lexicon into a specific model like LSTM or Transformer. However, in this case, the performance strongly depends on the quality of lexicon and the matching between lexicon and corpora. In reality, there are definitely some noises in the words provided by lexicon, being unhelpful for Chinese NER. To address this issue, in this paper, we propose a simple but effective multi-task learning method with helpful word selection for lexicon-enhanced Chinese NER. One task is to score the matched words and select top-K more helpful ones of them. The other task is to integrate the selected words by multi-head attention network and further implement Chinese NER by character-level sequence labeling. The two tasks are jointly learned with the same encoder. A series of experiments are conducted on three public datasets, demonstrating that the proposed method outperforms the recent advanced baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

In this paper, we conducted the experiments based on three public datasets: Ontonotes4, Weibo and Resume. The data availability statements are as follows.

- Ontonotes4 that supports the findings of this study is available from Linguistic Data Consortium (https://catalog.ldc.upenn.edu/LDC2011T03) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Linguistic Data Consortium.

- Weibo is released by the published paper (doi: 10.18653/v1/D15-1064) and can be downloaded from https://github.com/hltcoe/golden-horse.

Resume is released by the published paper (doi: 10.18653/v1/P18-1144) and can be downloaded from https://github.com/jiesutd/LatticeLSTM.

Notes

  1. https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip

  2. https://github.com/ymcui/Chinese-BERT-wwm

  3. https://catalog.ldc.upenn.edu/LDC2011T13

  4. https://github.com/fxsjy/jieba

References

  1. Cetoli A, Bragaglia S, O’Harney AD, Sloan M (2018) Graph convolutional networks for named entity recognition. In: Proceedings of the 16th international workshop on treebanks and linguistic theories, Prague, Czech Republic, January 23-24, pp 37–45

  2. Chen C, Kong F (2021) Enhancing entity boundary detection for better Chinese named entity recognition. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, Virtual Event, August 1-6, pp 20–25

  3. Chen Y, Xu L, Liu K, Zeng D, Zhao J (2015) Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language, Beijing, China, July 26-31, pp 167–176

  4. Chiu JPC, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguistics 4:357–370

    Article  Google Scholar 

  5. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  6. Cui Y, Che W, Liu T, Qin B, Yang Z (2021) Pre-training with whole word masking for Chinese BERT. IEEE ACM Trans Audio Speech Lang Process 29:3504–3514

    Article  Google Scholar 

  7. Devlin J., Chang M., Lee K., Toutanova K. (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 17th conference of the north american chapter of the association for computational linguistics: human language technologies, Minneapolis, MN, USA, June 2-7, pp 4171–4186

  8. Ding R, Xie P, Zhang X, Lu W, Li L, Si L (2019) A neural multi-digraph model for Chinese NER with gazetteers. In: Proceedings of the 57th conference of the association for computational linguistics, Florence, Italy, July 28- August 2, pp 1462–1467

  9. Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211

    Article  Google Scholar 

  10. Gu Y, Qu X, Wang Z, Zheng Y, Huai B, Yuan NJ (2022) Delving deep into regularity: a simple but effective method for Chinese named entity recognition. In: Findings of the association for computational linguistics, seattle, WA, United States, July 10-15, pp 1863–1873

  11. Gui T, Ma R, Zhang Q, Zhao L, Jiang Y, Huang X (2019) Cnn-based Chinese NER with lexicon rethinking. In: Proceedings of the 28th international joint conference on artificial intelligence, Macao, China, August 10-16, pp 4982–4988

  12. Gui T, Zou Y, Zhang Q, Peng M, Fu J, Wei Z, Huang X (2019) A lexicon-based graph neural network for Chinese NER. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, Hong Kong, China, November 3-7, pp 1040–1050

  13. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  14. Hu B, Huang Z, Hu M, Zhang Z, Dou Y (2022) Adaptive threshold selective self-attention for Chinese NER. In: Proceedings of the 29th international conference on computational linguistics, Gyeongju, Republic of Korea, October 12-17, pp 1823–1833

  15. Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence tagging. CoRR. arXiv:1508.01991

  16. Jin G, Chen X (2008) The fourth international Chinese language processing bakeoff: Chinese word segmentation, named entity recognition and Chinese POS tagging. In: Proceedings of the 3rd international joint conference on natural language processing, Hyderabad, India, January 7-12, pp 69–81

  17. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics, Valencia, Spain, April 3-7, pp 427–431

  18. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations, San Diego, CA, USA, May 7–9

  19. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations, Toulon, France, April 24–26

  20. Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning, williams college, Williamstown, MA, USA, June 28 – July 1, pp. 282–289

  21. Levow G (2006) The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the 5th workshop on chinese language processing, Sydney, Australia, July 22–23, pp 108–117

  22. Li H, Hagiwara M, Li Q, Ji H (2014) Comparison of the impact of word segmentation on name tagging for Chinese and Japanese. In: Proceedings of the 9th international conference on language resources and evaluation, Reykjavik, Iceland, May 26–31, pp 2532–2536

  23. Li S, Zhao Z, Hu R, Li W, Liu T, Du X (2018) Analogical reasoning on Chinese morphological and semantic relations. In: Proceedings of the 56th annual meeting of the association for computational linguistics, Melbourne, Australia, July 15–20, pp 138–143

  24. Li X, Yan H, Qiu X, Huang X (2020) FLAT : Chinese NER using flat-lattice transformer. In: Proceedings of the 58th annual meeting of the association for computational linguistics, July 5–10, pp. 6836–6842

  25. Liu M, Tu Z, Wang Z, Xu X (2020) LTP: a new active learning strategy for bert-crf based named entity recognition. CoRR. arXiv:2001.02524

  26. Liu Z, Zhu C, Zhao T (2010) Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words?. In: Proceedings of the 6th International Conference on Intelligent Computing, Changsha, China, August 18–21, pp. 634–640

  27. Lothritz C, Allix K, Veiber L, Bissyandé TF, Klein J (2020) Evaluating pretrained transformer-based models on the task of fine-grained named entity recognition. In: Proceedings of the 28th international conference on computational linguistics, Barcelona, Spain, December 8–13, pp 3750–3760

  28. Ma R, Peng M, Zhang Q, Wei Z, Huang X (2020) Simplify the usage of lexicon in Chinese NER. In: Proceedings of the 58th annual meeting of the association for computational linguistics, July 5–10, pp 5951–5960

  29. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of 1st international conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4

  30. Peng N, Dredze M (2015) Named entity recognition for Chinese social media with jointly trained embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, September 17–21, pp 548– 554

  31. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, Doha, Qatar, October 25–29, pp 1532–1543

  32. Riedel S, Yao L, McCallum A, Marlin BM (2013) Relation extraction with matrix factorization and universal schemas. In: Proceedings of human language technologies: conference of the North American chapter of the association of computational linguistics, Atlanta, Georgia, USA, June 9–14, pp 74–84

  33. Ronran C, Lee S (2020) Effect of character and word features in bidirectional LSTM-CRF for NER. In: Proceedings of the 2020 IEEE international conference on big data and smart computing, Busan, Korea (South), February 19–22, pp 613–616

  34. Song Y, Shi S, Li J, Zhang H (2018) Directional skip-gram: explicitly distinguishing left and right context for word embeddings. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, New Orleans, Louisiana, USA, June 1–6, pp 175–180

  35. Sui D, Chen Y, Liu K, Zhao J, Liu S (2019) Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, Hong Kong, China, November 3–7, pp 3828–3838

  36. Tang Z, Wan B, Yang L (2020) Word-character graph convolution network for Chinese named entity recognition. IEEE ACM Trans Audio Speech Lang Process 28:1520–1532

    Article  Google Scholar 

  37. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Annual conference on neural information processing systems, long beach, CA, USA, December 4–9, pp 5998–6008

  38. Weischedel R, Pradhan S, Ramshaw L, Palmer M, Xue N, Marcus M, Taylor A, Greenberg C, Hovy E, Belvin R (2011) OntoNotes Release 4.0. Philadelphia, Penn Linguistic Data Consortium

  39. Wu S, Song X, Feng Z (2021) MECT : multi-metadata embedding based cross-transformer for Chinese named entity recognition. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, Virtual Event, August 1–6, pp 1529–1539

  40. Xue M, Yu B, Liu T, Zhang Y, Meng E, Wang B (2020) Porous lattice transformer encoder for Chinese NER. In: Proceedings of the 28th international conference on computational linguistics, Barcelona, Spain, December 8–13, pp 3831–3841

  41. Yan H, Deng B, Li X, Qiu X (2019) TENER : adapting transformer encoder for named entity recognition. CoRR. arXiv:1911.04474

  42. Yang J, Yang R, Wang C, Xie J (2018) Multi-entity aspect-based sentiment analysis with context, entity and aspect memory. In: Proceedings of the 32nd AAAI conference on artificial intelligence, the 30th innovative applications of artificial intelligence, and the 8th AAAI symposium on educational advances in artificial intelligence, New Orleans, Louisiana, USA, February 2–7, pp 6029–6036

  43. Zelenko D, Aone C, Richardella A (2003) Kernel methods for relation extraction. J Mach Learn Res 3:1083–1106

    MathSciNet  MATH  Google Scholar 

  44. Zhang Y, Yang J (2018) Chinese NER using lattice LSTM. In: Proceedings of the 56th annual meeting of the association for computational linguistics, Melbourne, Australia, July 15–20, pp 1554–1564

  45. Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th conference of the association for computational linguistics, Florence, Italy, July 28– August 2, pp. 1441–1451

  46. Zhao S, Hu M, Cai Z, Chen H, Liu F (2021) Dynamic modeling cross- and self-lattice attention network for Chinese NER. In: Proceedings of the 35th AAAI conference on artificial intelligence, Virtual Event, February 2–9, pp 14,515–14,523

  47. Zhu H, Hu W, Zeng Y (2019) Flexner: a flexible LSTM - CNN stack framework for named entity recognition. In: Proceedings of the 8th CCF international conference on natural language processing and Chinese computing, Dunhuang, China, October 9–14, pp 168–178

  48. Zhu P, Cheng D, Yang F, Luo Y, Huang D, Qian W, Zhou A (2022) Improving Chinese named entity recognition by large-scale syntactic dependency graph. IEEE ACM Trans Audio Speech Lang Process 30:979–991

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by in part by the National Natural Science Foundation of China under Grant 62207002, China Postdoctoral Science Foundation under Grant 2022TQ0040, 2022M720486, and National Natural Science Foundation of China under Grant U1911201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuetao Tian.

Ethics declarations

Competing interests

- The authors have no relevant financial or non-financial interests to disclose.

- The authors have no competing interests to declare that are relevant to the content of this article.

- All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

- The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, X., Bu, X. & He, L. Multi-task learning with helpful word selection for lexicon-enhanced Chinese NER. Appl Intell 53, 19028–19043 (2023). https://doi.org/10.1007/s10489-023-04464-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04464-0

Keywords

Navigation