Skip to main content

Improving Open Information Extraction for Informal Web Documents with Ripple-Down Rules

  • Conference paper
Knowledge Management and Acquisition for Intelligent Systems (PKAW 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7457))

Included in the following conference series:

Abstract

The World Wide Web contains a massive amount of information in unstructured natural language and obtaining valuable information from informally written Web documents is a major research challenge. One research focus is Open Information Extraction (OIE) aimed at developing relation-independent information extraction. Open Information Extraction systems seek to extract all potential relations from the text rather than extracting a few pre-defined relations. Existing Open Information Extraction systems have mainly focused on Web’s heterogeneity rather than the Web’s informality. The performance of the REVERB system, a state-of-the-art OIE system, drops dramatically as informality increases in Web documents.

This paper proposes a Hybrid Ripple-Down Rules based Open Information Extraction (Hybrid RDROIE) system, which uses RDR on top of a conventional OIE system. The Hybrid RDROIE system applies RDR’s incremental learning technique as an add-on to the state-of-the-art REVERB OIE system to correct the performance degradation of REVERB due to the Web’s informality in a domain of interest. With this wrapper approach, the baseline performance is that of the REVERB system with RDR correcting errors in a domain of interest. The Hybrid RDROIE system doubled REVERB’s performance in a domain of interest after two hours training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Collot, M., Belmore, N.: Electronic Language: A New Variety of English. In: Computer-Mediated Communications: Linguistic, Social and Cross-Cultural Perspectives (1996)

    Google Scholar 

  2. Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: Proceedings of the HLT/NAACL (2006)

    Google Scholar 

  3. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence (2007)

    Google Scholar 

  4. Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. Paper Presented at the Proceedings of ACL 2008: HLT (2008)

    Google Scholar 

  5. Kim, M.H., Compton, P., Kim, Y.-s.: RDR-based Open IE for the Web Document. In: 6th International Conference on Knowledge Capture, Banff, Alberta, Canada (2011)

    Google Scholar 

  6. Sekine, S.: On-demand information extraction. In: Proceedings of the COLING/ACL (2006)

    Google Scholar 

  7. Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: Proceedings of the HLT/NAACL (2006)

    Google Scholar 

  8. Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.-R.: StatSnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th WWW (2009)

    Google Scholar 

  9. Wu, F., Weld, D.S.: Open Information Extraction using Wikipedia. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden (2010)

    Google Scholar 

  10. Fader, A., Soderland, S., Etzioni, O.: Identifying Relations for Open Information Extraction. In: EMNLP, Scotland, UK (2011)

    Google Scholar 

  11. Compton, P., Peters, L., Lavers, T., Kim, Y.-S.: Experience with long-term knowledge acquisition. In: 6th International Conference on Knowledge Capture, pp. 49–56. ACM, Banff (2011)

    Chapter  Google Scholar 

  12. Ho, V.H., Compton, P., Benatallah, B., Vayssiere, J., Menzel, L., Vogler, H.: An incremental knowledge acquisition method for improving duplicate invoices detection. In: Proceedings of the International Conference on Data Engineering (2009)

    Google Scholar 

  13. Kang, B., Compton, P., Preston, P.: Multiple classification ripple down rules: evaluation and possibilities. In: Proceedings of the 9th Banff Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, February 26-March 3, vol. 1, pp. 17.1 – 17.20 (1995)

    Google Scholar 

  14. Bunescu, R.C., Mooney, R.J.: Learning to Extract Relations from the Web using Minimal Supervision. In: Proceedings of the 45th ACL (2007)

    Google Scholar 

  15. Pham, S.B., Hoffmann, A.: Extracting Positive Attributions from Scientific Papers. In: Discovery Science Conference (2004)

    Google Scholar 

  16. Pham, S.B., Hoffmann, A.: Efficient Knowledge Acquisition for Extracting Temporal Relations. In: 17th European Conference on Artificial Intelligence, Italy (2006)

    Google Scholar 

  17. Xu, H., Hoffmann, A.: RDRCE: Combining Machine Learning and Knowledge Acquisition. In: Kang, B.-H., Richards, D. (eds.) PKAW 2010. LNCS, vol. 6232, pp. 165–179. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, M.H., Compton, P. (2012). Improving Open Information Extraction for Informal Web Documents with Ripple-Down Rules. In: Richards, D., Kang, B.H. (eds) Knowledge Management and Acquisition for Intelligent Systems. PKAW 2012. Lecture Notes in Computer Science(), vol 7457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32541-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32541-0_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32540-3

  • Online ISBN: 978-3-642-32541-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics