ABSTRACT
Named Entity Recognition (NER) is a subtask of natural language processing. Its accuracy is crucial for downstream tasks. In Chinese NER, word information is often added to enhance the semantic and boundary information of Chinese words, but these methods ignore the radical information of Chinese characters. This paper propose a multi-feature fusion model(MFFM) for Chinese NER. First, the input sequences are exported to the BERT layer, the word embedding layer and the radical embedding layer respectively; then the above three layer output are combined together as input of the Bidirectional Long Short-Term Memory(BiLSTM) layer to model the contextual information; finally annotate the sequence with conditional random field. The proposed model not only avoids the import of complex structures, but also effectively captures the character features of the context, thus improves the recognition performance. The experimental results show that the F1 value of MFFM reaches 71.02% on the Weibo dataset, which is 3.12% higher than that of the BERT model, and 82.78% on the OntoNotes4.0 dataset, which is 0.85% higher than that of the BERT model.
- Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Volume 1: Long Papers. Florence, Italy, 1441–1451. https://doi.org/10.18653/v1/P19-1139Google ScholarCross Ref
- Bogdan Babych, and Anthony Hartley. 2003. Improving machine translation quality with automatic named entity recognition. In Proceedings of the 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL 2003. Hungary, 1-8.Google ScholarDigital Library
- Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2020. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering. 34, 1 (January 2020), 50-70. https://doi.org/10.1109/TKDE.2020.2981314Google ScholarDigital Library
- Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional LSTM-CRF Models for Sequence Tagging. Retrieved August 9, 2015 from http://arxiv.org/abs/1508.01991 .Google Scholar
- Marek Rei, Gamal Crichton, and Sampo Pyysalo. 2016. Attending to characters in neural sequence labeling models. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee. Osaka, Japan, 309–318.Google Scholar
- Xuezhe Ma, and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. Berlin, Germany, 1064–1074. https://doi.org/10.18653/v1/P16-1101Google ScholarCross Ref
- Hangfeng He, and Xu Sun. 2017. A unified model for cross-domain and semi-supervised named entity recognition in chinese social media. In Proceedings of the AAAI Conference on Artificial Intelligence. California, USA. https://doi.org/10.5555/3298023.3298036Google ScholarDigital Library
- Yue Zhang, and Jie Yang. 2018. Chinese NER Using Lattice LSTM. In Proceeding of the 56th Annual Meeting of the Assocoation for Computational Linguistic. Melbourne, Australia,1:1554-1564. https://doi.org/10.18653/v1/P18-1144Google ScholarCross Ref
- Chuanhai Dong, Jiajun Zhang, Chengqing Zong, Masanori Hattori, and Hui Di. 2016. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In Natural Language Understanding and Intelligent Applications, 239-250. Springer.Google Scholar
- Canwen Xu, Feiyang Wang, Jialong Han, and Chenliang Li. 2019. Exploiting multiple embeddings for chinese named entity recognition. In Proceedings of the 28th ACM international conference on information and knowledge management. Beijing, China, 2269-2272.Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, 4171–4186.Google Scholar
- Tao Gui, Yicheng Zou, Qi Zhang, Minlong Peng, Jinlan Fu, Zhongyu Wei, and Xuanjing Huang. 2019. A lexicon-based graph neural network for Chinese NER. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 1040-1050. https://doi.org/10.18653/v1/D19-1096Google ScholarCross Ref
- Xiaonan Li, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. FLAT: Chinese NER Using Flat-Lattice Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online, 6836–6842. https://doi.org/10.18653/v1/2020.acl-main.611Google ScholarCross Ref
- Cijian Song, Yan Xiong, Wenchao Huang, and Lu Ma. 2020. Joint self-attention and multi-embeddings for chinese named entity recognition. In Proceedings of the 6th International Conference on Big Data Computing and Communications (BIGCOM). IEEE, DeQing, China, 76-80. https://doi.org/10.1109/BigCom51056.2020.00017Google ScholarCross Ref
- Shuang Wu, Xiaoning Song, and Zhenhua Feng. 2021. Mect: Multi-metadata embedding based cross-transformer for chinese named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long papers). Association for Computational Linguistics, Online, 1529-1539. https://doi.org/10.18653/v1/2021.acl-long.121Google ScholarCross Ref
- Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What Does BERT Learn about the Structure of Language?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics , Florence, Italy, 3651–3657. https://doi.org/10.18653/v1/P19-1356Google ScholarCross Ref
- Vikas Yadav, Rebecca Sharp, and Steven Bethard. 2018. Deep affix features improve neural named entity recognizers. In Proceedings of the seventh joint conference on lexical and computational semantics. Association for Computational Linguistics, New Orleans, Louisiana, 167-172. https://doi.org/10.18653/v1/S18-2021Google ScholarCross Ref
- Yanran Li, Wenjie Li, Fei Sun, and Sujian Li. 2015. Component-enhanced Chinese character embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Lisbon, Portugal, 829–834. https://doi.org/10.18653/v1/D15-1098Google ScholarCross Ref
- Nanyun Peng, and Mark Dredze. 2015, Named entity recognition for chinese social media with jointly trained embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, 548-554. https://doi.org/10.18653/v1/D15-1064Google ScholarCross Ref
- Ralph Weischedel, Sameer Prad-han, Lance Ramshaw, Martha Palmer, Nianwen Xue, Mitchell Marcus, Ann Taylor, Craig Greenberg, Eduard Hovy, and Robert Belvin. 2011. Ontonotes release 4.0. Retrieved February 15, 2011 from https://catalog.ldc.upenn.edu/LDC2011T03Google Scholar
- Yuying Zhu, and Guoxin Wang. 2019. CAN-NER: Convolutional attention network for Chinese named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Volume 1 (Long and Short Papers). Minneapolis, MN, USA, 3384–3393. https://doi.org/10.18653/v1/N19-1342Google ScholarCross Ref
- Ruotian Ma, Minlong Peng, Qi Zhang, and Xuanjing Huang. 2020. Simplify the Usage of Lexicon in Chinese NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5951–5960. https://doi.org/10.18653/v1/2020.acl-main.528Google ScholarCross Ref
Index Terms
- A Chinese Named Entity Recognition Method Fusing Word and Radical Features
Recommendations
Chinese Named Entity Recognition with Character-Word Mixed Embedding
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNamed Entity Recognition (NER) is an important basis for the tasks in natural language processing such as relation extraction, entity linking and so on. The common method of existing Chinese NER systems is to use the character sequence as the input, and ...
Chinese mineral named entity recognition based on BERT model
AbstractMineral named entity recognition (MNER) is the extraction for the specific types of entities from unstructured Chinese mineral text, which is a prerequisite for building a mineral knowledge graph. MNER can also provide important data ...
Highlights- Present a BERT-based model for Chinese mineral named entity recognition.
- ...
Arabic Named Entity Recognition Using Clustered Word Embedding
Computational Linguistics and Intelligent Text ProcessingAbstractNamed Entity Recognition in Arabic is a challenging topic because of morphological and lexical richness of Arabic. In this paper, we propose an Arabic NER system that is based on word embedding. Word embedding hold semantic information about the ...
Comments