ABSTRACT
Named Entity Recognition (NER) is an important basis for the tasks in natural language processing such as relation extraction, entity linking and so on. The common method of existing Chinese NER systems is to use the character sequence as the input, and the intention is to avoid the word segmentation. However, the character sequence cannot express enough semantic information, so that the recognition accuracy of Chinese NER is not as good as western language such as English. To solve this issue, we propose a Chinese NER method based on Character-Word Mixed Embedding (CWME), and the method is in accord with the pipeline of Chinese natural language processing. Our experiments show that incorporating CWME can effectively improve the performance for the Chinese corpus with state-of-the-art neural architectures widely used in NER, and the proposed method yields nearly 9% absolute improvement over previously results.
- Miguel Ballesteros, Chris Dyer, and Noah A Smith. 2015. Improved transition-based parsing by modeling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657 (2015).Google Scholar
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research Vol. 3, Feb (2003), 1137--1155. Google ScholarDigital Library
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research Vol. 12, Aug (2011), 2493--2537. Google ScholarDigital Library
- Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Meeting on Association for Computational Linguistics. 363--370. Google ScholarDigital Library
- Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named entity recognition through classifier combination Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 168--171. Google ScholarDigital Library
- Guohong Fu and Kang-Kwong Luke. 2005. Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explorations Newsletter Vol. 7, 1 (2005), 19--25. Google ScholarDigital Library
- Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
- Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Onur Kuru, Ozan Arkan Can, and Deniz Yuret. 2016. CharNER: Character-Level Named Entity Recognition. COLING. 911--921.Google Scholar
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
- Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W Black, and Isabel Trancoso. 2015. Finding function in form: Compositional character models for open vocabulary word representation EMNLP. 1520--1530.Google Scholar
- Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354 (2016).Google Scholar
- Alexandre Passos, Vineet Kumar, and Andrew McCallum. 2014. Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367 (2014).Google Scholar
- Nanyun Peng and Mark Dredze. 2015. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings. EMNLP. 548--554.Google Scholar
- Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, and Jason Weston. 2009. Combining labeled and unlabeled data with word-class distribution learning Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 1737--1740. Google ScholarDigital Library
Index Terms
- Chinese Named Entity Recognition with Character-Word Mixed Embedding
Recommendations
Arabic Named Entity Recognition Using Clustered Word Embedding
Computational Linguistics and Intelligent Text ProcessingAbstractNamed Entity Recognition in Arabic is a challenging topic because of morphological and lexical richness of Arabic. In this paper, we propose an Arabic NER system that is based on word embedding. Word embedding hold semantic information about the ...
Simultaneous character-cluster-based word segmentation and named entity recognition in Thai language
KICSS'10: Proceedings of the 5th international conference on Knowledge, information, and creativity support systemsNamed entity recognition in inherent-vowel alphabetic languages such as Burmese, Khmer, Lao, Tamil, Telugu, Bali, and Thai, is difficult since there are no explicit boundaries among words or sentences. This paper presents a novel method to exploit the ...
LACNNER: Lexicon-Aware Character Representation for Chinese Nested Named Entity Recognition
Advances in Swarm IntelligenceAbstractNamed Entity Recognition (NER) is one of fundamental researches in natural language processing. Chinese nested-NER is even more challenging. Recently, studies on NER have generally focused on the extraction of flat structures by sequence ...
Comments