The Impact of Enriched Linguistic Annotation on the Performance of Extracting Relation Triples

Kim, Sanghee; Lewis, Paul; Martinez, Kirk

doi:10.1007/978-3-540-24630-5_68

Sanghee Kim⁵,
Paul Lewis⁵ &
Kirk Martinez⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2945))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

952 Accesses
1 Citations

Abstract

A relation extraction system recognises pre-defined relation types between two identified entities from natural language documents. It is important for a task of automatically locating missing instances in knowledge base where the instance is represented as a triple (‘entity – relation – entity’). A relation entry specifies a set of rules associated with the syntactic and semantic conditions under which appropriate relations would be extracted. Manually creating such rules requires knowledge from information experts and moreover, it is a time-consuming and error-prone task when the input sentences have little consistency in terms of structures and vocabularies. In this paper, we present an approach for applying a symbolic learning algorithm to sentences in order to automatically induce the extraction rules which then successfully classify a new sentence. The proposed approach takes into account semantic attributes (e.g., semantically close words and named-entities) in generalising common patterns among the sentences which enable the system to cope better with syntactically different but semantically similar sentences. Not only does this increase the number of relations extracted, but it also improves the accuracy in extracting relations by adding features which might not be discovered only with syntactic analysis. Experimental results show that this approach is effective on the sentences of the Web documents obtaining 17% higher precision and 34% higher recall values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aitken, J.S.: Learning information extraction rules: An inductive logic programming approach. In: Proc. of European Conf. on Artificial Intelligence, ECAI, France, pp. 335–359 (2002)
Google Scholar
Aone, C., Halverson, L., Hampton, T., Ramos-Santacruz, M.: SRA: Description of the IE system used for MUC-7, MUC-7 (1998)
Google Scholar
Aone, C., Ramos-Santacruz, M.: REES: A Large-Scale Relation and Event Extraction System. In: Proc. of the 6th Applied Natural Language Processing Conference, U.S.A, pp. 76–83 (2000)
Google Scholar
Ciravegna, F.: Adaptive Information Extraction from Text by Rule Induction and Generalisation. In: Proc. 17th Int. Joint Conf. on Artificial Intelligence, Seattle (2001)
Google Scholar
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to Extract Symbolic Knowledge from the World Wide Web. Technical report, Carnegie Mellon University, U.S.A, CMU-CS-98-122 (1998)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, Philadelphia, USA, pp. 168–175 (2002)
Google Scholar
Freitag, D.: Information Extraction from HTML: Application of a General Machine Learning Approach. In: Proc. AAAI 1998, pp. 517–523 (1998)
Google Scholar
Kim, S., Alani, H., Hall, W., Lewis, P.H., Millard, D.E., Shadbolt, N.R., Weal, M.W.: Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web. In: Proc. of the Workshop on the Semantic Authoring, Annotation & Knowledge Markup in the 15th European Con. on Artificial Intelligence, France, pp. 1–6 (2002)
Google Scholar
Kim, S., Hall, W., Keane, A.: Natural Language Processing for Expertise Modelling in Email Communication. In: Proc. of the 3rd Int. Con. on Intelligent Data Engineering and Automated Reasoning, England, pp. 161–166 (2002)
Google Scholar
Marsh, E., Perzanowski, D.: MUC-7 Evaluation of IE Technology: Overview of Results (1998), available at: http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to wordnet: An on-line lexical database. Technical report, University of Princeton, U.S.A. (1993)
Google Scholar
Muggleton, S.: Inverse entailment and Progol. New Generation Computing 13, 245–286 (1995)
Article Google Scholar
Parson, R., Muggleton, S.: An experiment with browsers that learn. In: Furukawa, K., Michie, D., Muggleton, S. (eds.) Machine Intelligence, vol. 15. Oxford University Press, Oxford (1998)
Google Scholar
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in Taxonomy. In: Proc. of the 14th Int. Joint Con. on Artificial Intelligence, pp. 448–453 (1995)
Google Scholar
Roth, D., Yih, W.T.: Probabilistic reasoning for entity & relation recognition. In: COLING 2002 (2002)
Google Scholar
Sekine, S., Grishman, R.: A corpus-based probabilistic grammar with only two nonterminals. In: Proc. of the 1st International Workshop on Multimedia annotation, Japan (2001)
Google Scholar
Staab, S., Maedche, A., Handschuh, S.: An annotation framework for the semantic web. In: Proc. of the 1st International Workshop on MultiMedia Annotation, Japan (2001)
Google Scholar
Vargas-Vera, M., Motta, E., Domingue, J.: Knowledge extraction by using an ontologybased annotation tool. In: Proc. of the Workshop on Knowledge Markup and Semantic Annotation, KCAP 2001, Canada (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligence, Agents, MultiMedia Group, Department of Electronics and Computer Science, University of Southampton, U.K.
Sanghee Kim, Paul Lewis & Kirk Martinez

Authors

Sanghee Kim
View author publications
You can also search for this author in PubMed Google Scholar
Paul Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Kirk Martinez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, S., Lewis, P., Martinez, K. (2004). The Impact of Enriched Linguistic Annotation on the Performance of Extracting Relation Triples. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_68

Download citation

DOI: https://doi.org/10.1007/978-3-540-24630-5_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics