Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 220))

  • 510 Accesses

Abstract

Extraction ontologies represent a novel paradigm in web information extraction (as one of ‘deductive’ species of web mining) allowing to swiftly proceed from initial domain modelling to running a functional prototype, without the necessity of collecting and labelling large amounts of training examples. Bottlenecks in this approach are however the tedium of developing an extraction ontology adequately covering the semantic scope of web data to be processed and the difficulty of combining the ontology-based approach with inductive or wrapper-based approaches. We report on an ongoing project aiming at developing a web information extraction tool based on richly-structured extraction ontologies and with additional possibility of (1) semi-automatically constructing these from third-party domain ontologies, (2) absorbing the results of inductive learning for subtasks where pre-labelled data abound, and (3) actively exploiting formatting regularities in the wrapper style.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ciravegna, F.: (LP)2, an Adaptive Algorithm for Information Extraction from Web-related Texts. In: Proc. IJCAI 2001 Workshop on Adaptive Text Extraction and Mining, Seattle (2001)

    Google Scholar 

  2. Duda, R.O., Gasching, J., Hart, P.E.: Model design in the Prospector consultant system for mineral exploration. In: Readings in Artificial Intelligence, pp. 334–348 (1981)

    Google Scholar 

  3. Embley, D.W., Tao, C., Liddle, D.W.: Automatically extracting ontologically specified data from HTML tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, New Jersey (2001)

    Google Scholar 

  5. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web. Sem. 2, 49–79 (2004)

    Google Scholar 

  6. Labský, M.: Information Extraction from Websites using Extraction Ontologies. Technical Report, KEG UEP (2009), http://eso.vse.cz/~labsky/ex/exo09.pdf

  7. Labský, M., Svátek, V.: On the Design and Exploitation of Presentation Ontologies for Information Extraction. In: ESWC 2006 Workshop on Mastering the Gap: From Information Extraction to Semantic Representation. CEUR-WS, vol. 187 (2006)

    Google Scholar 

  8. Labský, M., Svátek, V., Šváb, O.: Types and Roles of Ontologies in Web Information Extraction. In: ECML/PKDD Workshop on Knowledge Discovery and Ontologies, Pisa (2004)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  10. Nekvasil, M., Svátek, V., Labský, M.: Transforming Existing Knowledge Models to Information Extraction Ontologies. In: Proc. 11th International Conference on Business Information Systems. LNBIP, vol. 7, pp. 106–117. Springer, Heidelberg (2008)

    Google Scholar 

  11. Popescu, A., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proc. Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, Canada, pp. 339–346 (2005)

    Google Scholar 

  12. Ruffolo, M., Manna, M.: HiLεX: A System for Semantic Information Extraction from Web Documents. In: Proc. Enterprise Information Systems. LNBIP, pp. 194–209. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Sabou, M., Lopez, V., Motta, E.: Ontology selection for the real semantic web: How to cover the queen’s birthday dinner? In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS, vol. 4248, pp. 96–111. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Stamatakis, K., Metsis, V., Karkaletsis, V., Růžička, M., Svátek, V., Amigó, E., Pöllä, M., Spyropoulos, C.D.: Content collection for the labelling of health-related web content. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds.) AIME 2007. LNCS, vol. 4594, pp. 341–345. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Svátek, V., Labský, M., Vacura, M.: Knowledge Modelling for Deductive Web Mining. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS, vol. 3257, pp. 337–353. Springer, Heidelberg (2004)

    Google Scholar 

  16. Wei, X., Croft, B., McCallum, A.: Table Extraction for Answer Retrieval. Information Retrieval Journal 9(5), 589–611 (2006)

    Article  Google Scholar 

  17. Wick, M., Culotta, A., McCallum, A.: Learning Field Compatibilities to Extract Database Records from Unstructured Text. In: Proc. Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 603–611 (2006)

    Google Scholar 

  18. Yates, A., Etzioni, O.: Unsupervised Resolution of Objects and Relations on the Web. In: Proc. NAACL Human Language Technologies Conference, pp. 121–130 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Labský, M., Svátek, V., Nekvasil, M., Rak, D. (2009). The Ex Project: Web Information Extraction Using Extraction Ontologies. In: Berendt, B., et al. Knowledge Discovery Enhanced with Semantic and Social Information. Studies in Computational Intelligence, vol 220. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01891-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01891-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01890-9

  • Online ISBN: 978-3-642-01891-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics