research-article

Combining relations for information extraction from free text

Authors:

Mstislav Maslennikov,

Tat-Seng ChuaAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 28, Issue 3

Article No.: 14, Pages 1 - 35

https://doi.org/10.1145/1777432.1777437

Published: 02 July 2010 Publication History

Abstract

Relations between entities of the same semantic type tend to be sparse in free texts. Therefore, combining relations is the key to effective information extraction (IE) on free text datasets with a small set of training samples. Previous approaches to bootstrapping for IE used different types of relations, such as dependency or co-occurrence, and faced the problems of paraphrasing and misalignment of instances. To cope with these problems, we propose a framework that integrates several types of relations. After extracting candidate entities, our framework evaluates relations between them at the phrasal, dependency, semantic frame, and discourse levels. For each of these levels, we build a classifier that outputs a score for relation instances. In order to integrate these scores, we propose three strategies: (1) integrate evaluation scores from each relation classifier; (2) incorporate the elimination of negatively labeled instances in a previous strategy; and (3) add cascading of extracted relations into strategy (2). Our framework improves the state-of-art results for supervised systems by 8%, 15%, 3%, and 5% on MUC4 (terrorism); MUC6 (management succession); ACE RDC 2003 (news, general types); and ACE RDC 2003 (news, specific types) domains respectively.

References

[1]

Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain text collections. In Proceedings of the ACM Conference on Digital Libraries. ACM, New York, 85--94.

Digital Library

[2]

Brin, S. 1998. Extracting patterns and relations from the World Wide Web. In Proceedings of the WebDB Workshop at EDBT.

Digital Library

[3]

Bunescu, R. and Mooney, R. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07). 576--583.

[4]

Cimiano, P., Reyle, U., and Saric, J. 2005. Ontology-driven discourse analysis for information extraction. Data Knowl. Engin. 55, 1, 59--83.

Digital Library

[5]

Cimiano, P. and Reyle, U. 2003. Ontology-based semantic construction, under specification and disambiguation. In Proceedings of the Prospects and Advances in the Syntax-Semantic Interface Workshop.

[6]

Chieu, H. L. and Ng, H. T. 2002. A maximum entropy approach to information extraction from semi-structured and free text, In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI'02). 786--791.

Digital Library

[7]

Ciravegna, F. 2001. Adaptive information extraction from text by rule induction and generalization. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI'01).

Digital Library

[8]

Collobert, R. and Weston, J. 2007. Fast semantic extraction using a novel neural network architecture. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL'07). 232--239.

[9]

Cui, H., Kan, M.Y., and Chua, T. S. 2004. Unsupervised learning of soft patterns for definitional question answering. In Proceedings of the 13th International Conference on World Wide Web (WWW'04). 90--99.

Digital Library

[10]

Cui, H., Kan, M.Y., and Chua, T. S. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'05). ACM, New York, 400--407.

Digital Library

[11]

Culotta, A. and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.

Digital Library

[12]

Davidov, D., Rappoport, A., and Koppel, M. 2007. Fully unsupervised discovery of concept-specific relationships by web mining. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL'07). 232--239.

[13]

Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39, 1, 1--38.

[14]

Efron, B. 1979. Bootstrap methods: Another look at the Jackknife. Ann. Stat. 7, 1, 1--26.

[15]

Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D., and Yates, A. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165, 1, 91--134.

Digital Library

[16]

Grosz, B. and Sidner, C. 1986. Attention, intentions and the structure of discourse. Comput. Linguist. 12, 3, 175--204.

Digital Library

[17]

Halliday, M. and Hasan, R. 1976. Cohesion in English. Longman, London.

[18]

Lin, D. 1997. Dependency-based evaluation of Minipar. In Proceedings of the Workshop on the Evaluation of Parsing Systems.

[19]

Lin, W., Yangarber, R., and Grishman, R. 2003. Bootstrapped learning of semantic classes from positive and negative examples. In Proceedings of the ICML Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. 103--110.

[20]

Liu, B., Lee, W. S., Yu, P. S., and Li, X. 2002. Partially supervised classification of text documents. In Proceedings of the Nineteenth International Conference on Machine Learning (ICML). 387--394.

Digital Library

[21]

Maslennikov, M. and Chua, T. S. 2007. A Multi-resolution framework for information extraction from free text. In Proceedings of the 45th Annual Meeting on Association for Computational Linguistics (ACL'07). 592--599.

[22]

Maslennikov, M., Goh, H. K., and Chua, T. S. 2006. ARE: Instance splitting strategies for dependency relation-based information extraction. In Proceedings of the 44th Annual Meeting on Association for Computational Linguistics (ACL'06). 571--578.

Digital Library

[23]

Miltsakaki, E. 2003. The syntax-discourse interface: effects of the main-subordinate distinction on attention structure. Ph.D. dissertation, University of Pennsylvania.

Digital Library

[24]

Moens, M. F. and De Busser, R. 2002. First steps in building a model for the retrieval of court decisions. Int. J. Human-Comput. Stud. 57, 5, 429--446.

Digital Library

[25]

Niu, C., Li, W., Ding J., and Srihari, R. K. 2003. A bootstrapping approach to named entity classification using successive learners. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL'03). 335--342.

Digital Library

[26]

Pradhan, S., Ward, W., Hacioglu, K., Martin, J., and Jurafsky, D. 2004. Shallow semantic parsing using support vector machines. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association of Computational Linguistics (HLT/NAACL'04).

[27]

Riloff, E., Wiebe, J., and Phillips, W. 2005. Exploiting subjectivity classification to improve information extraction. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI'05). 1106--1111.

Digital Library

[28]

Riloff, E. and Jones, R. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference (AAAI'99/IAAI'99). 474--479.

Digital Library

[29]

Riloff, E. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI'96),1044--1049.

Digital Library

[30]

Roth, D. and Yih, W. 2002. Probabilistic reasoning for entity and relation recognition. In Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), 1--7.

Digital Library

[31]

Soderland, S. 1999. Learning information extraction rules for semi-structured and free text. Mach. Learn. 34, 1--3, 233--272.

Digital Library

[32]

Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL'03). 149--156.

Digital Library

[33]

Surdeanu, M., Harabagiu, S., Williams, J., and Aarseth, P. 2003. Using predicate arguments structures for information extraction, In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL'03). 8--15.

Digital Library

[34]

Taboada, M. and Mann, W. 2005. Applications of rhetorical structure theory. Discourse Stud. 8, 4, 567--588.

[35]

Thelen, M. and Riloff, E. 2002. A bootstrapping method for learning semantic lexicons using extraction pattern context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'02). 214--221.

Digital Library

[36]

Webber, B., Stone, M., Joshi, A., and Knott, A. 2002. Anaphora and discourse structure. Comput. Linguist. 29, 4.

Digital Library

[37]

Xiao, J., Chua, T. S., and Liu, J. 2003. A global rule induction approach to information extraction. In Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03). 530--536.

Digital Library

[38]

Xiao, J., Chua, T. S., and Cui, H. 2004. Cascading use of soft and hard matching pattern rules for weakly supervised information extraction. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04).

Digital Library

[39]

Yang, H., Cui, H., Kan, M.Y., Maslennikov, M., Qiu, L. and Chua, T. S. 2003. QUALIFIER in TREC-12 QA main task. In Notebook of the 12th Text Retrieval Conference (TREC'03).

[40]

Yangarber, R., Lin, W., and Grishman, R. 2002. Unsupervised learning of generalized names. In Proceedings of the 19th International Conference on Computational Linguistics (COLING'02). 1--7.

Digital Library

[41]

Yangarber, R. 2003. Counter-training in discovery of semantic patterns. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL'03). 343--350.

Digital Library

[42]

Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL'06). 825--832.

Digital Library

[43]

Zhou, G., Su, J., and Zhang, M. 2006. Modeling commonality among related classes in relation extraction. In Proceedings of 44th Annual Meeting of the Association for Computational Linguistics (ACL'06). 121--128.

Digital Library

Cited By

Huang HYuan CLiu QCao Y(2023)Document-level Relation Extraction via Separate Relation Representation and Logical ReasoningACM Transactions on Information Systems10.1145/359761042:1(1-24)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3597610
Sun KZhang RMensah SMao YLiu X(2022)Learning Implicit and Explicit Multi-task Interactions for Information ExtractionACM Transactions on Information Systems10.1145/353302041:2(1-29)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3533020
Shang YHuang HSun XWei WMao X(2022)Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation ExtractionACM Transactions on Information Systems10.1145/352008241:1(1-23)Online publication date: 9-Mar-2022
https://dl.acm.org/doi/10.1145/3520082
Show More Cited By

Index Terms

Combining relations for information extraction from free text
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Mining relational data from text: From strictly supervised to weakly supervised learning

This paper approaches the relation classification problem in information extraction framework with different machine learning strategies, from strictly supervised to weakly supervised. A number of learning algorithms are presented and empirically ...
A Flexible Text Mining System for Entity and Relation Extraction in PubMed
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means ...
Weakly-supervised relation classification for information extraction
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

This paper approaches the relation classification problem in information extraction framework with bootstrapping on top of Support Vector Machines. A new bootstrapping algorithm is proposed and empirically evaluated on the ACE corpus. We show that the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 28, Issue 3

June 2010

231 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/1777432

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2010

Accepted: 01 July 2009

Revised: 01 May 2009

Received: 01 November 2007

Published in TOIS Volume 28, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
822
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang HYuan CLiu QCao Y(2023)Document-level Relation Extraction via Separate Relation Representation and Logical ReasoningACM Transactions on Information Systems10.1145/359761042:1(1-24)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1145/3597610
Sun KZhang RMensah SMao YLiu X(2022)Learning Implicit and Explicit Multi-task Interactions for Information ExtractionACM Transactions on Information Systems10.1145/353302041:2(1-29)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3533020
Shang YHuang HSun XWei WMao X(2022)Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation ExtractionACM Transactions on Information Systems10.1145/352008241:1(1-23)Online publication date: 9-Mar-2022
https://dl.acm.org/doi/10.1145/3520082
Zou JKanoulas E(2020)Towards Question-based High-recall Information RetrievalACM Transactions on Information Systems10.1145/338864038:3(1-35)Online publication date: 18-May-2020
https://dl.acm.org/doi/10.1145/3388640
Wu Y(2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1177/0165551517706219
Sen STao JDeokar A(2014)On the Role of Ontologies in Information ExtractionReshaping Society through Analytics, Collaboration, and Decision Support10.1007/978-3-319-11575-7_8(115-133)Online publication date: 5-Nov-2014
https://doi.org/10.1007/978-3-319-11575-7_8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents