skip to main content
10.1145/3132847.3133088acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Chinese Named Entity Recognition with Character-Word Mixed Embedding

Published:06 November 2017Publication History

ABSTRACT

Named Entity Recognition (NER) is an important basis for the tasks in natural language processing such as relation extraction, entity linking and so on. The common method of existing Chinese NER systems is to use the character sequence as the input, and the intention is to avoid the word segmentation. However, the character sequence cannot express enough semantic information, so that the recognition accuracy of Chinese NER is not as good as western language such as English. To solve this issue, we propose a Chinese NER method based on Character-Word Mixed Embedding (CWME), and the method is in accord with the pipeline of Chinese natural language processing. Our experiments show that incorporating CWME can effectively improve the performance for the Chinese corpus with state-of-the-art neural architectures widely used in NER, and the proposed method yields nearly 9% absolute improvement over previously results.

References

  1. Miguel Ballesteros, Chris Dyer, and Noah A Smith. 2015. Improved transition-based parsing by modeling characters instead of words with LSTMs. arXiv preprint arXiv:1508.00657 (2015).Google ScholarGoogle Scholar
  2. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research Vol. 3, Feb (2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research Vol. 12, Aug (2011), 2493--2537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Meeting on Association for Computational Linguistics. 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named entity recognition through classifier combination Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 168--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Guohong Fu and Kang-Kwong Luke. 2005. Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explorations Newsletter Vol. 7, 1 (2005), 19--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google ScholarGoogle Scholar
  8. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  9. Onur Kuru, Ozan Arkan Can, and Deniz Yuret. 2016. CharNER: Character-Level Named Entity Recognition. COLING. 911--921.Google ScholarGoogle Scholar
  10. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google ScholarGoogle Scholar
  11. Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W Black, and Isabel Trancoso. 2015. Finding function in form: Compositional character models for open vocabulary word representation EMNLP. 1520--1530.Google ScholarGoogle Scholar
  12. Xuezhe Ma and Eduard Hovy. 2016. End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354 (2016).Google ScholarGoogle Scholar
  13. Alexandre Passos, Vineet Kumar, and Andrew McCallum. 2014. Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367 (2014).Google ScholarGoogle Scholar
  14. Nanyun Peng and Mark Dredze. 2015. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings. EMNLP. 548--554.Google ScholarGoogle Scholar
  15. Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, and Jason Weston. 2009. Combining labeled and unlabeled data with word-class distribution learning Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 1737--1740. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Chinese Named Entity Recognition with Character-Word Mixed Embedding

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
          November 2017
          2604 pages
          ISBN:9781450349185
          DOI:10.1145/3132847

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 November 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          CIKM '17 Paper Acceptance Rate171of855submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader