Chinese Open Relation Extraction and Knowledge Base Establishment

Published: 14 February 2018 Publication History


Named entity relation extraction is an important subject in the field of information extraction. Although many English extractors have achieved reasonable performance, an effective system for Chinese relation extraction remains undeveloped due to the lack of Chinese annotation corpora and the specificity of Chinese linguistics. Here, we summarize three kinds of unique but common phenomena in Chinese linguistics. In this article, we investigate unsupervised linguistics-based Chinese open relation extraction (ORE), which can automatically discover arbitrary relations without any manually labeled datasets, and research the establishment of a large-scale corpus. By mapping the entity relations into dependency-trees and considering the unique Chinese linguistic characteristics, we propose a novel unsupervised Chinese ORE model based on Dependency Semantic Normal Forms (DSNFs). This model imposes no restrictions on the relative positions among entities and relationships and achieves a high yield by extracting relations mediated by verbs or nouns and processing the parallel clauses. Empirical results from our model demonstrate the effectiveness of this method, which obtains stable performance on four heterogeneous datasets and achieves better precision and recall in comparison with several Chinese ORE systems. Furthermore, a large-scale knowledge base of entity and relation, called COER, is established and published by applying our method to web text, which conquers the trouble of lack of Chinese corpora.

Supplementary Material

a15-jia-apndx.pdf (
Supplemental movie, appendix, image and software files for, Chinese Open Relation Extraction and Knowledge Base Establishment


Information & Contributors


Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 3
September 2018
196 pages
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2018
Accepted: 01 November 2017
Revised: 01 July 2017
Received: 01 April 2017
Published in TALLIP Volume 17, Issue 3


Request permissions for this article.

Check for updates

Author Tags

  1. Chinese entity relation extraction
  2. and knowledge base
  3. dependency parsing
  4. linguistics
  5. open
  6. unsupervised


  • Research-article
  • Research
  • Refereed

Funding Sources

  • Project of Science and Technology Commission of Shanghai Municipality
  • National Basic Research Program of China
  • National Natural Science Foundation of China


