Abstract
Extraction of meaningful content from collections of web pages with unknown structure is a challenging task, which can only be successfully accomplished by exploiting multiple heterogeneous resources. In the Ex information extraction tool, so-called extraction ontologies are used by human designers to specify the domain semantics, to manually provide extraction evidence, as well as to define extraction subtasks to be carried out via trainable classifiers. Elements of an extraction ontology can be endowed with probability estimates, which are used for selection and ranking of attribute and instance candidates to be extracted. At the same time, HTML formatting regularities are locally exploited.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Duda, R.O., Gasching, J., Hart, P.E.: Model design in the Prospector consultant system for mineral exploration. Readings in Artificial Intelligence, 334–348 (1981)
Embley, D.W., Tao, C., Liddle, D.W.: Automatically extracting ontologically specified data from HTML tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)
Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web Sem. 2, 49–79 (2004)
Labský, M., Nekvasil, M., Svátek, V., Rak, D.: The Ex Project: Web Information Extraction using Extraction Ontologies. In: Proc. PriCKL workshop, ECML/PKDD (2007)
Labský, M., Svátek, V: Information extraction with presentation ontologies. Technical report, KEG UEP, http://eso.vse.cz/~labsky/ex/ex.pdf
Popescu, A., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proc. EMNLP (2005)
Wei, X., Croft, B., McCallum, A.: Table Extraction for Answer Retrieval. Information Retrieval Journal 9(5), 589–611 (2006)
Wick, M., Culotta, A., McCallum, A.: Learning Field Compatibilities to Extract Database Records from Unstructured Text. In: Proc. EMNLP (2006)
Yates, A., Etzioni, O.: Unsupervised Resolution of Objects and Relations on the Web. In: Proc. HLT (2007)
Dietterich, T.G.: Machine Learning for Sequential Data: A Review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Labský, M., Svátek, V. (2008). Combining Multiple Sources of Evidence in Web Information Extraction. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-68123-6_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68122-9
Online ISBN: 978-3-540-68123-6
eBook Packages: Computer ScienceComputer Science (R0)