skip to main content
research-article

Chinese Open Relation Extraction and Knowledge Base Establishment

Published: 14 February 2018 Publication History

Abstract

Named entity relation extraction is an important subject in the field of information extraction. Although many English extractors have achieved reasonable performance, an effective system for Chinese relation extraction remains undeveloped due to the lack of Chinese annotation corpora and the specificity of Chinese linguistics. Here, we summarize three kinds of unique but common phenomena in Chinese linguistics. In this article, we investigate unsupervised linguistics-based Chinese open relation extraction (ORE), which can automatically discover arbitrary relations without any manually labeled datasets, and research the establishment of a large-scale corpus. By mapping the entity relations into dependency-trees and considering the unique Chinese linguistic characteristics, we propose a novel unsupervised Chinese ORE model based on Dependency Semantic Normal Forms (DSNFs). This model imposes no restrictions on the relative positions among entities and relationships and achieves a high yield by extracting relations mediated by verbs or nouns and processing the parallel clauses. Empirical results from our model demonstrate the effectiveness of this method, which obtains stable performance on four heterogeneous datasets and achieves better precision and recall in comparison with several Chinese ORE systems. Furthermore, a large-scale knowledge base of entity and relation, called COER, is established and published by applying our method to web text, which conquers the trouble of lack of Chinese corpora.

Supplementary Material

a15-jia-apndx.pdf (jia.zip)
Supplemental movie, appendix, image and software files for, Chinese Open Relation Extraction and Knowledge Base Establishment

References

[1]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 4825. 722--735.
[2]
Michele Banko, M. J. Cafarella, and Stephen Soderland. 2007. Open information extraction for the web. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI’07). 2670--2676.
[3]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). 1247--1250.
[4]
Danushka Tarupathi Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2010. Relational duality: Unsupervised extraction of semantic relations between entities on the web. In Proceedings of the International World Wide Web Conference (WWW’10). 151--160.
[5]
Miriam Butt. 2003. The light verb jungle. Harv. Work. Pap. Ling. 9, 1988 (2003), 1--49.
[6]
Wanxiang Che, Jianmin Jiang, Zhong Su, Yue Pan, and Ting Liu. 2005. Improved-edit-distance kernel for chinese relation extraction. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’05). 134--139.
[7]
Wanxiang Che, Zhenghua Li, and Ting Liu. 2010. LTP: A chinese language technology platform. In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations (COLING’10). 13--16.
[8]
Yu Chen, Dequan Zheng, and Tiejun Zhao. 2012. Chinese relation extraction based on deep belief nets. J. Softw. 23, 10 (2012), 2572--2585.
[9]
Yanping Chen, Qinghua Zheng, and Ping Chen. 2015. Feature assembly method for extracting relations in chinese. Artif. Intell. 228 (2015), 179--194.
[10]
Nancy Chinchor and Elaine Marsh. 1998. MUC-7 information extraction task definition. In Proceedings of the 7th Message Understanding Conference (MUC-7’98). 359--367.
[11]
Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic role labeling for open information extraction. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2010 1st International Workshop on Formalisms and Methodology for Learning by Reading. 52--60.
[12]
Janara Christensen, Stephen Soderland, and Oren Etzioni. 2011. An analysis of open information extraction based on semantic role labeling categories and subject descriptors. In Proceedings of the 6th International Conference on Knowledge Capture (K-CAP’11). 113--119.
[13]
Luciano Del Corro and Rainer Gemulla. 2013. Clausie: Clause-based open information extraction. In Proceedings of the 22nd International Conference on World Wide Web. 355--366.
[14]
Cicero Nogueira dos Santos, Bing Xiang, and Bowen Zhou. 2015. Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53nd Annual Meeting on Association for Computational Linguistics (ACL’15). 626--634.
[15]
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’11), Vol. 1. 3--10.
[16]
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545.
[17]
D Freitag. 2000. Machine learning for information extraction in informal domains. Mach. Learn. 39, 2-3 (2000), 169--202.
[18]
Lixin Gan, Changxuan Wan, Dexi Liu, and Jiang Tengjiao Zhong, Qing. 2016. Chinese named entity relation extraction based on syntactic and semantic features. J. Comput. Res. Dev. 53, 2 (2016), 284--302.
[19]
Xiyue Guo, Tingting He, Xiaohua Hu, and Qianjun Chen. 2014. Chinese named entity relation extraction based on syntactic and semantic features. J. Chin. Inf. Process. 28, 6 (2014), 183--189.
[20]
Takaaki Hasegawa, Satoshi Sekine, and Ralph Grishman. 2004. Discovering relations among named entities from large corpora. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), Vol. 415. 415--422.
[21]
Chen Huang, Longhua Qin, Guodong Zhou, and Qiaoming Zhu. 2010. Research on unsupervised chinese entity relation extraction based on convolution tree kernel. J. Chin. Inf. Process. 24, 4 (2010), 11--17.
[22]
Nanda Kambhatla. 2004. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL’04). 22.
[23]
Johannes Kirschnick, Holmer Hemsen, and Volker Markl. 2016. JEDI : Joint entity and relation detection using type inference. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). 61--66.
[24]
Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-scale learning of relation-extraction rules with distant supervision from the web. In Proceedings of the 11th International Conference on the Semantic Web (ISWC’12), Vol. 1. 263--278.
[25]
Wenjie Li, Peng Zhang, Furu Wei, Yuexian Hou, and Qin Lu. 2008. A novel feature-based approach to chinese entity relation extraction. In Proceedings of the 46nd Annual Meeting of the Association for Computational Linguistics (ACL’08). 89--92.
[26]
Ruqi Lin, Jinxiu Chen, Xiaofang Yang, and Honglei Xu. 2010. Research on mixed model-based chinese relation extraction. In Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT’10), Vol. 1. 687--691.
[27]
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54nd Annual Meeting on Association for Computational Linguistics (ACL’16). 2124--2133.
[28]
Dandan Liu, Zhiwei Zhao, Yanan Hu, and Longhua Qian. 2013. Incorporating lexical semantic similarity to tree kernel-based chinese relation extraction. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 7717. 11--21.
[29]
Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni. 2012. Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 523--534.
[30]
Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and efficiency of open relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 447--457.
[31]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing Associations. 1003--1011.
[32]
Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54nd Annual Meeting on Association for Computational Linguistics (ACL’16). arxiv:1601.0770
[33]
Andrea Moro and Roberto Navigli. 2013. Integrating syntactic and semantic analysis into the open information extraction paradigm. In Proceedings of the 22th International Joint Conference on Artificial Intelligence (IJCAI’13). 2148--2154.
[34]
Ndapandula Nakashole, Gerhard Weikum, and Fabian M. Suchanek. 2012. PATTY: A taxonomy of relational patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’12). 1135--1145.
[35]
Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The importance of syntactic parsing and inference in semantic role labeling. Comput. Ling. 34, May 2007 (2008), 257--287.
[36]
Bing Qin, An’an Liu, and Ting Liu. 2015. Unsupervised chinese open entity relation extraction. J. Comput. Res. Dev. 52, 5 (2015), 1029--1035.
[37]
Likun Qiu and Yue Zhang. 2014. ZORE : A syntax-based system for chinese open relation extraction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1870--1880.
[38]
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO: A core of semantic knowledge unifyingwordnet and wikipedia fabian. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 697.
[39]
Yuen-hsien Tseng, Lung-hao Lee, Shu-yen Lin, Bo-shun Liao, Mei-jun Liu, Hsin-hsi Chen, Oren Etzioni, and Anthony Fader. 2014. Chinese open relation extraction for knowledge acquisition. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL’14). 12--16.
[40]
Jing Wang. 2012. Research on Unsupervised Chinese Entity Relation Extraction Method. Ph.D. thesis.
[41]
Jing Wang, Jing Yang, Liang He, Xin Lin, Chao Chen, and Tianlong Ma. 2011. Chinese entity relation extraction based on word cooccurrence. Energy Proc. 13 (2011), 8048--8055.
[42]
Fei Wu and Daniel S. Weld. 2010. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 118--127.
[43]
Yan Xu, Lili Mou, Ge Li, and Yunchuan Chen. 2015. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1785--1794.
[44]
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.
[45]
Ji Zhang, You Ouyang, Wenjie Li, and Yuexian Hou. 2009. A novel composite kernel approach to chinese entity relation extraction. In Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy (ICCPOL’09), Vol. 5459. 236--247.
[46]
Peng Zhang, Wenjie Li, Furu Wei, Qin Lu, and Yuexian Hou. 2008. Exploiting the role of position feature in chinese relation extraction. In Proceedings of the 6th International Language Resources and Evaluation (LREC’08). 2120--2124.
[47]
Y. Zhang and J. F. Zhou. 2000. A trainable method for extracting chinese entity names and their relations. In Proceedings of the 2nd Chinese Language Processing Workshop. 66--72.
[48]
Shanshan Zheng. 2013. Entity Relation Extraction Based on Chinese Grammar in Open Area. Ph.D. Dissertation.

Cited By

View all
  • (2024)Archetypes of influential users in social question-answering sitesInternet Research10.1108/INTR-05-2023-040035:1(419-447)Online publication date: 24-May-2024
  • (2024)APRCOIE: An open information extraction system for ChineseSoftwareX10.1016/j.softx.2024.10164926(101649)Online publication date: May-2024
  • (2023)Improving Extraction of Chinese Open Relations Using Pre-trained Language Model and Knowledge EnhancementData Intelligence10.1162/dint_a_002275:4(962-989)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 3
September 2018
196 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3184403
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2018
Accepted: 01 November 2017
Revised: 01 July 2017
Received: 01 April 2017
Published in TALLIP Volume 17, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Chinese entity relation extraction
  2. and knowledge base
  3. dependency parsing
  4. linguistics
  5. open
  6. unsupervised

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Project of Science and Technology Commission of Shanghai Municipality
  • National Basic Research Program of China
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Archetypes of influential users in social question-answering sitesInternet Research10.1108/INTR-05-2023-040035:1(419-447)Online publication date: 24-May-2024
  • (2024)APRCOIE: An open information extraction system for ChineseSoftwareX10.1016/j.softx.2024.10164926(101649)Online publication date: May-2024
  • (2023)Improving Extraction of Chinese Open Relations Using Pre-trained Language Model and Knowledge EnhancementData Intelligence10.1162/dint_a_002275:4(962-989)Online publication date: 1-Nov-2023
  • (2023)Chinese Open Event Extraction via Low-Rank Adaptation of ChatGLM6bProceedings of the 2023 3rd Guangdong-Hong Kong-Macao Greater Bay Area Artificial Intelligence and Big Data Forum10.1145/3660395.3660461(388-393)Online publication date: 22-Sep-2023
  • (2023)Reading Scene Text with Aggregated Temporal Convolutional EncoderACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362582222:11(1-16)Online publication date: 12-Oct-2023
  • (2023)Document-Level Relation Extraction with Path ReasoningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357289822:4(1-14)Online publication date: 25-Mar-2023
  • (2023)Discovering Reliable Information Extraction Patterns with Pre-Trained Model for Text with Writing Style2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394538(1690-1695)Online publication date: 1-Oct-2023
  • (2023)Search and Inference of Collision-Preventing Regulations Based on Chinese Semantic Similarity2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)10.1109/AINIT59027.2023.10212712(668-671)Online publication date: 16-Jun-2023
  • (2023)DoreBer: Document-Level Relation Extraction Method Based on BernNetIEEE Access10.1109/ACCESS.2023.333787111(136468-136477)Online publication date: 2023
  • (2023)Understanding the formation process of negative customer engagement behaviours: a quantitative and qualitative interpretationTotal Quality Management & Business Excellence10.1080/14783363.2023.227739535:1-2(170-201)Online publication date: 14-Nov-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media