Skip to main content
Log in

CNERVis: a visual diagnosis tool for Chinese named entity recognition

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is a crucial initial task that identifies both spans and types of named entities to extract the specific information, such as organization, person, location, and time. Nowadays, the NER task achieves state-of-the-art performance by deep learning approaches for capturing contextual features. However, the complex structures of deep learning make a black-box problem and limit researchers’ ability to improve it. Unlike the Latin alphabet, Chinese (or other languages such as Korean and Japanese) do not have an explicit word boundary. Therefore, some preliminary works, such as word segmentation (WS) and part-of-speech tagging (POS), are needed before the Chinese NER task. The correctness of preliminary works importantly influences the final NER prediction. Thus, investigating the model behavior of the Chinese NER task becomes more complicated and challenging. In this paper, we present CNERVis, a visual analysis tool that allows users to interactively inspect the WS-POS-NER pipeline and understand how and why a NER prediction is made. Also, CNERVis allows users to load the numerous testing data and explores the critical instances to facilitate the analysis from large datasets. Our tool’s usability and effectiveness are demonstrated through case studies.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation, pp 265–283

  • Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Comput Linguist 4:357–370

    Article  Google Scholar 

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078

  • Cui J, Long J, Min E, Mao Y (2018) Wedl-nids: improving network intrusion detection using word embedding-based deep learning method. In: International conference on modeling decisions for artificial intelligence. Springer, pp 283–295

  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805

  • Endert A, Ribarsky W, Turkay C, Wong BW, Nabney I, Blanco ID, Rossi F (2017) The state of the art in integrating machine learning into visual analytics. Comput Graph Forum, Wiley Online Libr 36:458–486

    Article  Google Scholar 

  • Ethayarajh K (2019) How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. arXiv preprint arXiv:190900512

  • Gargiulo F, Silvestri S, Ciampi M, De Pietro G (2019) Deep neural network for hierarchical extreme multi-label text classification. Appl Soft Comput 79:125–138

    Article  Google Scholar 

  • Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: Continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  • Gillick D, Lazic N, Ganchev K, Kirchner J, Huynh D (2014) Context-dependent fine-grained entity type tagging. arXiv preprint arXiv:14121820

  • Grinberg M (2018) Flask web development: developing web applications with python. O’Reilly Media, Inc.

  • Hovy E, Marcus M, Palmer M, Ramshaw L, Weischedel R (2006) Ontonotes: the 90% solution. In: Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers, pp 57–60

  • Joarder Y, Rahman KM, Mahi FF (2020) Uplifted tissue characterization and classification of fatty liver disease from ultrasound images. Adv Image Processing Pattern Recognit 3(3):1–13. https://zenodo.org/record/4014957#.YY9f3L1Bxqs

  • Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv preprint arXiv:150602078

  • Kilimci ZH (2018) Akyokus S (2018) Deep learning-and word embedding-based heterogeneous classifier ensembles for text classification. Complexity 2018:1–10. https://www.hindawi.com/journals/complexity/2018/7130146/

  • Li G, Wang J, Shen HW, Chen K, Shan G, Lu Z (2020a) Cnnpruner: Pruning convolutional neural networks with visual analytics. IEEE Trans Vis Comput Graph 27(2):1364–1373. https://ieeexplore.ieee.org/abstract/document/9222510?casa_token=CDeIBZAdRQoAAAAA:Ff4c5DkdikfdbyRLdouA6zc58WB305f1-IUxDkNdrHlulWqn0ymg9k6HcqnHuMOHsen62yIZ7uk

  • Li J, Chen X, Hovy E, Jurafsky D (2015) Visualizing and understanding neural models in NLP. arXiv preprint arXiv:150601066

  • Li PH, Fu TJ, Ma WY (2020b) Why attention? Analyze BiLSTM deficiency and its remedies in the case of NER. Proc AAAI Conf Artif Intell 34:8236–8244

    Google Scholar 

  • Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of machine learning models: a visual analytics perspective. Vis Inf 1(1):48–56

    Google Scholar 

  • McInnes L, Healy J, Melville J (2018) Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426

  • Ming Y, Cao S, Zhang R, Li Z, Chen Y, Song Y, Qu H (2017) Understanding hidden memories of recurrent neural networks. In: 2017 IEEE conference on visual analytics science and technology (VAST). IEEE, pp 13–24

  • Pham TH, Le-Hong P (2017) End-to-end recurrent neural network models for Vietnamese named entity recognition: word-level vs. character-level. In: International conference of the Pacific association for computational linguistics. Springer, pp 219–232

  • Reiss F, Xu H, Cutler B, Muthuraman K, Eichenberger Z (2020) Identifying incorrect labels in the CoNLL-2003 corpus. In: Proceedings of the 24th conference on computational natural language learning, pp 215–226

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  MathSciNet  MATH  Google Scholar 

  • Shneiderman B (2003) The eyes have it: a task by data type taxonomy for information visualizations. In: The craft of information visualization. Elsevier, pp 364–371

  • Strobelt H, Gehrmann S, Pfister H, Rush AM (2017) Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans Vis Comput Graph 24(1):667–676

    Article  Google Scholar 

  • Strobelt H, Gehrmann S, Behrisch M, Perer A, Pfister H, Rush AM (2018) S eq 2s eq-v is: a visual debugging tool for sequence-to-sequence models. IEEE Trans Vis Comput Graph 25(1):353–363

    Article  Google Scholar 

  • Wang Z, Shang J, Liu L, Lu L, Liu J, Han J (2019) Crossweigh: training named entity tagger from imperfect annotations. arXiv preprint arXiv:190901441

  • Xia X, Roppel T, Hung JY, Zhang J, Periaswamy SC, Patton J (2020) Environmental complexity measurement using Shannon entropy. In: 2020 SoutheastCon. IEEE, pp 1–6

  • Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. arXiv preprint arXiv:191011470

  • Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  • Zhang Y, Yang J (2018) Chinese NER using lattice LSTM. arXiv preprint arXiv:180502023

  • Zhao Y, Luo F, Chen M, Wang Y, Xia J, Zhou F, Wang Y, Chen Y, Chen W (2018) Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans Vis Comput Graph 25(1):12–21

    Article  Google Scholar 

  • Zhu Y, Wang G, Karlsson BF (2019) CAN-NER: Convolutional attention network for Chinese named entity recognition. arXiv preprint arXiv:190402141

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ko-Chih Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 75,005 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lo, PS., Wu, JL., Deng, ST. et al. CNERVis: a visual diagnosis tool for Chinese named entity recognition. J Vis 25, 653–669 (2022). https://doi.org/10.1007/s12650-021-00799-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-021-00799-3

Keywords

Navigation