ABSTRACT
Named entity recognition is an important task and basis for the intelligent information processing and knowledge representation learning of the Zhuang Language. A BilSTM-CNN-CRF network model combining the uppercase and lowercase characters of words is proposed to be applied to the named entity recognition task of the Zhuang language, which lacks corpus for named entity labeling. Firstly, word2vec is used to train in unmarked Zhuang text to get the word vector of the Zhuang language. Then convolutional neural network is used to extract the character features of Zhuang words, and the character feature vector is obtained. The above two vectors were connected with the initial case feature vectors, which are randomly generated, and then the connected vectors were input into a BilSTM-CNN-CRF model for training; thus, the end-to-end named entity recognition model of Zhuang language was constructed. Experimental results show that, without relying on artificial features and external dictionaries, the proposed method in this study is superior to contrastive models by achieving an 80.37% F1 value in the named entity recognition task, which leads to the realization of automated named entity recognition of Zhuang language.
- Yue W, Mengxuan W, Sheng Z Named Entity Recognition of Warning Text Based on BERT [J]. Computer Application,2020,40(02):535-540.Google Scholar
- Mengcheng M, Qingwen Y, Amutula E, etc. Chinese Named Entity Classification Based on Word Vector and Conditional Random Fields [J]. Computer Engineering and Design,2020,41(09):2515-2522.Google Scholar
- Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.Google Scholar
- Ma X, Hovy E. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354, 2016.Google Scholar
- Lishuang L, Yuankai G. Biomedical named entity recognition based on CNN-BLSTM-CRF model. Chinese Journal of Information, 2018, 1: 18-23.Google Scholar
- Tang Suqin, Sun Yaru, Li Zhixin Part of speech tagging of Zhuang language based on reinforcement learning. Computer Engineering,2020,46(04):309-315.Google Scholar
- Maimaitiayifu, SILAMU Wushouer, MUHETAER Palidan, Uyghur named entity recognition based on BiLSTM-CNN-CRF model.Computer Engineering, 2018, 44(8):230-236.Google Scholar
- Yang J, Liang S, Zhang Y. Design challenges and misconceptions in neural sequence labeling. arXiv preprint arXiv:1806.04470, 2018.Google Scholar
- Chiu J P C, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.Google ScholarCross Ref
- Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.Google ScholarDigital Library
- Graves A, Jaitly N, Mohamed A. Hybrid speech recognition with deep bidirectional LSTM. 2013 IEEE workshop on automatic speech recognition and understanding. IEEE, 2013: 273-278.Google Scholar
- Graves A. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.Google Scholar
- Lample G, Ballesteros M, Subramanian S, Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360, 2016.Google Scholar
- Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009). 2009: 147-155.Google ScholarDigital Library
- Dai H J, Lai P T, Chang Y C, Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization. Journal of cheminformatics, 2015, 7(1): 1-10.Google Scholar
- Dandan C,Xiulei L, Ruoyu C Lattice LSTM based Named Entity Recognition in Ancient Chinese [J]. Computer Science,2020,47(S2):18-22.Google Scholar
- Srivastava N, Hinton G, Krizhevsky A, Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014, 15(1): 1929-1958.Google Scholar
- Mikolov T, Chen K, Corrado G, Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv:1301.3781, 2013Google Scholar
Recommendations
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Named entity recognition an aid to improve multilingual entity filling in language-independent approach
IKM4DR '12: Proceedings of the first workshop on Information and knowledge management for developing regionThis paper details the approach to identify Named Entities (NEs) from a large non-English corpus and associate them with appropriate tags, requiring minimal human intervention and no linguistic expertise. The main objective in this paper is to focus on ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Comments