skip to main content
research-article

Combining relations for information extraction from free text

Published: 02 July 2010 Publication History

Abstract

Relations between entities of the same semantic type tend to be sparse in free texts. Therefore, combining relations is the key to effective information extraction (IE) on free text datasets with a small set of training samples. Previous approaches to bootstrapping for IE used different types of relations, such as dependency or co-occurrence, and faced the problems of paraphrasing and misalignment of instances. To cope with these problems, we propose a framework that integrates several types of relations. After extracting candidate entities, our framework evaluates relations between them at the phrasal, dependency, semantic frame, and discourse levels. For each of these levels, we build a classifier that outputs a score for relation instances. In order to integrate these scores, we propose three strategies: (1) integrate evaluation scores from each relation classifier; (2) incorporate the elimination of negatively labeled instances in a previous strategy; and (3) add cascading of extracted relations into strategy (2). Our framework improves the state-of-art results for supervised systems by 8%, 15%, 3%, and 5% on MUC4 (terrorism); MUC6 (management succession); ACE RDC 2003 (news, general types); and ACE RDC 2003 (news, specific types) domains respectively.

References

[1]
Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain text collections. In Proceedings of the ACM Conference on Digital Libraries. ACM, New York, 85--94.
[2]
Brin, S. 1998. Extracting patterns and relations from the World Wide Web. In Proceedings of the WebDB Workshop at EDBT.
[3]
Bunescu, R. and Mooney, R. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07). 576--583.
[4]
Cimiano, P., Reyle, U., and Saric, J. 2005. Ontology-driven discourse analysis for information extraction. Data Knowl. Engin. 55, 1, 59--83.
[5]
Cimiano, P. and Reyle, U. 2003. Ontology-based semantic construction, under specification and disambiguation. In Proceedings of the Prospects and Advances in the Syntax-Semantic Interface Workshop.
[6]
Chieu, H. L. and Ng, H. T. 2002. A maximum entropy approach to information extraction from semi-structured and free text, In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI'02). 786--791.
[7]
Ciravegna, F. 2001. Adaptive information extraction from text by rule induction and generalization. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI'01).
[8]
Collobert, R. and Weston, J. 2007. Fast semantic extraction using a novel neural network architecture. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL'07). 232--239.
[9]
Cui, H., Kan, M.Y., and Chua, T. S. 2004. Unsupervised learning of soft patterns for definitional question answering. In Proceedings of the 13th International Conference on World Wide Web (WWW'04). 90--99.
[10]
Cui, H., Kan, M.Y., and Chua, T. S. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'05). ACM, New York, 400--407.
[11]
Culotta, A. and Sorensen, J. 2004. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.
[12]
Davidov, D., Rappoport, A., and Koppel, M. 2007. Fully unsupervised discovery of concept-specific relationships by web mining. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL'07). 232--239.
[13]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39, 1, 1--38.
[14]
Efron, B. 1979. Bootstrap methods: Another look at the Jackknife. Ann. Stat. 7, 1, 1--26.
[15]
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D., and Yates, A. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165, 1, 91--134.
[16]
Grosz, B. and Sidner, C. 1986. Attention, intentions and the structure of discourse. Comput. Linguist. 12, 3, 175--204.
[17]
Halliday, M. and Hasan, R. 1976. Cohesion in English. Longman, London.
[18]
Lin, D. 1997. Dependency-based evaluation of Minipar. In Proceedings of the Workshop on the Evaluation of Parsing Systems.
[19]
Lin, W., Yangarber, R., and Grishman, R. 2003. Bootstrapped learning of semantic classes from positive and negative examples. In Proceedings of the ICML Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. 103--110.
[20]
Liu, B., Lee, W. S., Yu, P. S., and Li, X. 2002. Partially supervised classification of text documents. In Proceedings of the Nineteenth International Conference on Machine Learning (ICML). 387--394.
[21]
Maslennikov, M. and Chua, T. S. 2007. A Multi-resolution framework for information extraction from free text. In Proceedings of the 45th Annual Meeting on Association for Computational Linguistics (ACL'07). 592--599.
[22]
Maslennikov, M., Goh, H. K., and Chua, T. S. 2006. ARE: Instance splitting strategies for dependency relation-based information extraction. In Proceedings of the 44th Annual Meeting on Association for Computational Linguistics (ACL'06). 571--578.
[23]
Miltsakaki, E. 2003. The syntax-discourse interface: effects of the main-subordinate distinction on attention structure. Ph.D. dissertation, University of Pennsylvania.
[24]
Moens, M. F. and De Busser, R. 2002. First steps in building a model for the retrieval of court decisions. Int. J. Human-Comput. Stud. 57, 5, 429--446.
[25]
Niu, C., Li, W., Ding J., and Srihari, R. K. 2003. A bootstrapping approach to named entity classification using successive learners. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL'03). 335--342.
[26]
Pradhan, S., Ward, W., Hacioglu, K., Martin, J., and Jurafsky, D. 2004. Shallow semantic parsing using support vector machines. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association of Computational Linguistics (HLT/NAACL'04).
[27]
Riloff, E., Wiebe, J., and Phillips, W. 2005. Exploiting subjectivity classification to improve information extraction. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI'05). 1106--1111.
[28]
Riloff, E. and Jones, R. 1999. Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference (AAAI'99/IAAI'99). 474--479.
[29]
Riloff, E. 1996. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI'96),1044--1049.
[30]
Roth, D. and Yih, W. 2002. Probabilistic reasoning for entity and relation recognition. In Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), 1--7.
[31]
Soderland, S. 1999. Learning information extraction rules for semi-structured and free text. Mach. Learn. 34, 1--3, 233--272.
[32]
Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL'03). 149--156.
[33]
Surdeanu, M., Harabagiu, S., Williams, J., and Aarseth, P. 2003. Using predicate arguments structures for information extraction, In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL'03). 8--15.
[34]
Taboada, M. and Mann, W. 2005. Applications of rhetorical structure theory. Discourse Stud. 8, 4, 567--588.
[35]
Thelen, M. and Riloff, E. 2002. A bootstrapping method for learning semantic lexicons using extraction pattern context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'02). 214--221.
[36]
Webber, B., Stone, M., Joshi, A., and Knott, A. 2002. Anaphora and discourse structure. Comput. Linguist. 29, 4.
[37]
Xiao, J., Chua, T. S., and Liu, J. 2003. A global rule induction approach to information extraction. In Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'03). 530--536.
[38]
Xiao, J., Chua, T. S., and Cui, H. 2004. Cascading use of soft and hard matching pattern rules for weakly supervised information extraction. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04).
[39]
Yang, H., Cui, H., Kan, M.Y., Maslennikov, M., Qiu, L. and Chua, T. S. 2003. QUALIFIER in TREC-12 QA main task. In Notebook of the 12th Text Retrieval Conference (TREC'03).
[40]
Yangarber, R., Lin, W., and Grishman, R. 2002. Unsupervised learning of generalized names. In Proceedings of the 19th International Conference on Computational Linguistics (COLING'02). 1--7.
[41]
Yangarber, R. 2003. Counter-training in discovery of semantic patterns. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics (ACL'03). 343--350.
[42]
Zhang, M., Zhang, J., Su, J., and Zhou, G. 2006. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL'06). 825--832.
[43]
Zhou, G., Su, J., and Zhang, M. 2006. Modeling commonality among related classes in relation extraction. In Proceedings of 44th Annual Meeting of the Association for Computational Linguistics (ACL'06). 121--128.

Cited By

View all
  • (2023)Document-level Relation Extraction via Separate Relation Representation and Logical ReasoningACM Transactions on Information Systems10.1145/359761042:1(1-24)Online publication date: 21-Aug-2023
  • (2022)Learning Implicit and Explicit Multi-task Interactions for Information ExtractionACM Transactions on Information Systems10.1145/353302041:2(1-29)Online publication date: 11-Jun-2022
  • (2022)Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation ExtractionACM Transactions on Information Systems10.1145/352008241:1(1-23)Online publication date: 9-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 28, Issue 3
June 2010
231 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1777432
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2010
Accepted: 01 July 2009
Revised: 01 May 2009
Received: 01 November 2007
Published in TOIS Volume 28, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information extraction
  2. bootstrapping
  3. dependency relations
  4. discourse relations
  5. semantic relations

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Document-level Relation Extraction via Separate Relation Representation and Logical ReasoningACM Transactions on Information Systems10.1145/359761042:1(1-24)Online publication date: 21-Aug-2023
  • (2022)Learning Implicit and Explicit Multi-task Interactions for Information ExtractionACM Transactions on Information Systems10.1145/353302041:2(1-29)Online publication date: 11-Jun-2022
  • (2022)Learning Relation Ties with a Force-Directed Graph in Distant Supervised Relation ExtractionACM Transactions on Information Systems10.1145/352008241:1(1-23)Online publication date: 9-Mar-2022
  • (2020)Towards Question-based High-recall Information RetrievalACM Transactions on Information Systems10.1145/338864038:3(1-35)Online publication date: 18-May-2020
  • (2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018
  • (2014)On the Role of Ontologies in Information ExtractionReshaping Society through Analytics, Collaboration, and Decision Support10.1007/978-3-319-11575-7_8(115-133)Online publication date: 5-Nov-2014

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media