short-paper

Chinese Named Entity Recognition with Character-Word Mixed Embedding

Authors:
Shijia E

Tongji University, Shanghai, China

Tongji University, Shanghai, China
View Profile

,
Yang Xiang

Tongji University, Shanghai, China

Tongji University, Shanghai, China
View Profile

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementNovember 2017Pages 2055–2058https://doi.org/10.1145/3132847.3133088

Published:06 November 2017Publication History

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 2055–2058

ABSTRACT

Named Entity Recognition (NER) is an important basis for the tasks in natural language processing such as relation extraction, entity linking and so on. The common method of existing Chinese NER systems is to use the character sequence as the input, and the intention is to avoid the word segmentation. However, the character sequence cannot express enough semantic information, so that the recognition accuracy of Chinese NER is not as good as western language such as English. To solve this issue, we propose a Chinese NER method based on Character-Word Mixed Embedding (CWME), and the method is in accord with the pipeline of Chinese natural language processing. Our experiments show that incorporating CWME can effectively improve the performance for the Chinese corpus with state-of-the-art neural architectures widely used in NER, and the proposed method yields nearly 9% absolute improvement over previously results.

References

Miguel Ballesteros, Chris Dyer, and Noah A Smith. 2015. Improved transition-based parsing by modeling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657 (2015).Google Scholar
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research Vol. 3, Feb (2003), 1137--1155. Google ScholarDigital Library
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research Vol. 12, Aug (2011), 2493--2537. Google ScholarDigital Library
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Meeting on Association for Computational Linguistics. 363--370. Google ScholarDigital Library
Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named entity recognition through classifier combination Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 168--171. Google ScholarDigital Library
Guohong Fu and Kang-Kwong Luke. 2005. Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explorations Newsletter Vol. 7, 1 (2005), 19--25. Google ScholarDigital Library
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Onur Kuru, Ozan Arkan Can, and Deniz Yuret. 2016. CharNER: Character-Level Named Entity Recognition. COLING. 911--921.Google Scholar
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W Black, and Isabel Trancoso. 2015. Finding function in form: Compositional character models for open vocabulary word representation EMNLP. 1520--1530.Google Scholar
Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354 (2016).Google Scholar
Alexandre Passos, Vineet Kumar, and Andrew McCallum. 2014. Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367 (2014).Google Scholar
Nanyun Peng and Mark Dredze. 2015. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings. EMNLP. 548--554.Google Scholar
Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, and Jason Weston. 2009. Combining labeled and unlabeled data with word-class distribution learning Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 1737--1740. Google ScholarDigital Library

Index Terms

Chinese Named Entity Recognition with Character-Word Mixed Embedding
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
      2. Neural networks

Recommendations

Arabic Named Entity Recognition Using Clustered Word Embedding
Computational Linguistics and Intelligent Text Processing
Abstract
Named Entity Recognition in Arabic is a challenging topic because of morphological and lexical richness of Arabic. In this paper, we propose an Arabic NER system that is based on word embedding. Word embedding hold semantic information about the ...
Read More
Simultaneous character-cluster-based word segmentation and named entity recognition in Thai language
KICSS'10: Proceedings of the 5th international conference on Knowledge, information, and creativity support systems

Named entity recognition in inherent-vowel alphabetic languages such as Burmese, Khmer, Lao, Tamil, Telugu, Bali, and Thai, is difficult since there are no explicit boundaries among words or sentences. This paper presents a novel method to exploit the ...
Read More
LACNNER: Lexicon-Aware Character Representation for Chinese Nested Named Entity Recognition
Advances in Swarm Intelligence
Abstract
Named Entity Recognition (NER) is one of fundamental researches in natural language processing. Chinese nested-NER is even more challenging. Recently, studies on NER have generally focused on the extraction of flat structures by sequence ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
character embedding
named entity recognition
word embedding
Qualifiers
- short-paper
Conference

Acceptance Rates
CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 509
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Chinese Named Entity Recognition with Character-Word Mixed Embedding

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Arabic Named Entity Recognition Using Clustered Word Embedding

Simultaneous character-cluster-based word segmentation and named entity recognition in Thai language

LACNNER: Lexicon-Aware Character Representation for Chinese Nested Named Entity Recognition