Skip to main content

Combining Multiple Sources of Evidence in Web Information Extraction

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4994))

Included in the following conference series:

  • 1058 Accesses

Abstract

Extraction of meaningful content from collections of web pages with unknown structure is a challenging task, which can only be successfully accomplished by exploiting multiple heterogeneous resources. In the Ex information extraction tool, so-called extraction ontologies are used by human designers to specify the domain semantics, to manually provide extraction evidence, as well as to define extraction subtasks to be carried out via trainable classifiers. Elements of an extraction ontology can be endowed with probability estimates, which are used for selection and ranking of attribute and instance candidates to be extracted. At the same time, HTML formatting regularities are locally exploited.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Duda, R.O., Gasching, J., Hart, P.E.: Model design in the Prospector consultant system for mineral exploration. Readings in Artificial Intelligence, 334–348 (1981)

    Google Scholar 

  2. Embley, D.W., Tao, C., Liddle, D.W.: Automatically extracting ontologically specified data from HTML tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web Sem. 2, 49–79 (2004)

    Google Scholar 

  4. Labský, M., Nekvasil, M., Svátek, V., Rak, D.: The Ex Project: Web Information Extraction using Extraction Ontologies. In: Proc. PriCKL workshop, ECML/PKDD (2007)

    Google Scholar 

  5. Labský, M., Svátek, V: Information extraction with presentation ontologies. Technical report, KEG UEP, http://eso.vse.cz/~labsky/ex/ex.pdf

  6. Popescu, A., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proc. EMNLP (2005)

    Google Scholar 

  7. Wei, X., Croft, B., McCallum, A.: Table Extraction for Answer Retrieval. Information Retrieval Journal 9(5), 589–611 (2006)

    Article  Google Scholar 

  8. Wick, M., Culotta, A., McCallum, A.: Learning Field Compatibilities to Extract Database Records from Unstructured Text. In: Proc. EMNLP (2006)

    Google Scholar 

  9. Yates, A., Etzioni, O.: Unsupervised Resolution of Objects and Relations on the Web. In: Proc. HLT (2007)

    Google Scholar 

  10. Dietterich, T.G.: Machine Learning for Sequential Data: A Review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Aijun An Stan Matwin Zbigniew W. Raś Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Labský, M., Svátek, V. (2008). Combining Multiple Sources of Evidence in Web Information Extraction. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68123-6_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68122-9

  • Online ISBN: 978-3-540-68123-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics