Improving Open Information Extraction for Informal Web Documents with Ripple-Down Rules

Kim, Myung Hee; Compton, Paul

doi:10.1007/978-3-642-32541-0_14

Myung Hee Kim²¹ &
Paul Compton²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7457))

Included in the following conference series:

Pacific Rim Knowledge Acquisition Workshop

1197 Accesses
4 Citations

Abstract

The World Wide Web contains a massive amount of information in unstructured natural language and obtaining valuable information from informally written Web documents is a major research challenge. One research focus is Open Information Extraction (OIE) aimed at developing relation-independent information extraction. Open Information Extraction systems seek to extract all potential relations from the text rather than extracting a few pre-defined relations. Existing Open Information Extraction systems have mainly focused on Web’s heterogeneity rather than the Web’s informality. The performance of the REVERB system, a state-of-the-art OIE system, drops dramatically as informality increases in Web documents.

This paper proposes a Hybrid Ripple-Down Rules based Open Information Extraction (Hybrid RDROIE) system, which uses RDR on top of a conventional OIE system. The Hybrid RDROIE system applies RDR’s incremental learning technique as an add-on to the state-of-the-art REVERB OIE system to correct the performance degradation of REVERB due to the Web’s informality in a domain of interest. With this wrapper approach, the baseline performance is that of the REVERB system with RDR correcting errors in a domain of interest. The Hybrid RDROIE system doubled REVERB’s performance in a domain of interest after two hours training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Collot, M., Belmore, N.: Electronic Language: A New Variety of English. In: Computer-Mediated Communications: Linguistic, Social and Cross-Cultural Perspectives (1996)
Google Scholar
Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: Proceedings of the HLT/NAACL (2006)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence (2007)
Google Scholar
Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. Paper Presented at the Proceedings of ACL 2008: HLT (2008)
Google Scholar
Kim, M.H., Compton, P., Kim, Y.-s.: RDR-based Open IE for the Web Document. In: 6th International Conference on Knowledge Capture, Banff, Alberta, Canada (2011)
Google Scholar
Sekine, S.: On-demand information extraction. In: Proceedings of the COLING/ACL (2006)
Google Scholar
Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: Proceedings of the HLT/NAACL (2006)
Google Scholar
Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.-R.: StatSnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th WWW (2009)
Google Scholar
Wu, F., Weld, D.S.: Open Information Extraction using Wikipedia. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden (2010)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying Relations for Open Information Extraction. In: EMNLP, Scotland, UK (2011)
Google Scholar
Compton, P., Peters, L., Lavers, T., Kim, Y.-S.: Experience with long-term knowledge acquisition. In: 6th International Conference on Knowledge Capture, pp. 49–56. ACM, Banff (2011)
Chapter Google Scholar
Ho, V.H., Compton, P., Benatallah, B., Vayssiere, J., Menzel, L., Vogler, H.: An incremental knowledge acquisition method for improving duplicate invoices detection. In: Proceedings of the International Conference on Data Engineering (2009)
Google Scholar
Kang, B., Compton, P., Preston, P.: Multiple classification ripple down rules: evaluation and possibilities. In: Proceedings of the 9th Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, February 26-March 3, vol. 1, pp. 17.1 – 17.20 (1995)
Google Scholar
Bunescu, R.C., Mooney, R.J.: Learning to Extract Relations from the Web using Minimal Supervision. In: Proceedings of the 45th ACL (2007)
Google Scholar
Pham, S.B., Hoffmann, A.: Extracting Positive Attributions from Scientific Papers. In: Discovery Science Conference (2004)
Google Scholar
Pham, S.B., Hoffmann, A.: Efficient Knowledge Acquisition for Extracting Temporal Relations. In: 17th European Conference on Artificial Intelligence, Italy (2006)
Google Scholar
Xu, H., Hoffmann, A.: RDRCE: Combining Machine Learning and Knowledge Acquisition. In: Kang, B.-H., Richards, D. (eds.) PKAW 2010. LNCS, vol. 6232, pp. 165–179. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Myung Hee Kim & Paul Compton

Authors

Myung Hee Kim
View author publications
You can also search for this author in PubMed Google Scholar
Paul Compton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, Macquarie University, 2109, North Ryde, NSW, Australia
Deborah Richards
School of Computing and Information Systems, University of Tasmania, 7000, Hobart, Tasmania, Australia
Byeong Ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, M.H., Compton, P. (2012). Improving Open Information Extraction for Informal Web Documents with Ripple-Down Rules. In: Richards, D., Kang, B.H. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2012. Lecture Notes in Computer Science(), vol 7457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32541-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-32541-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32540-3
Online ISBN: 978-3-642-32541-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics