Abstract
Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontology-based information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument’s ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atzmüller, M., Klügl, P., Puppe, F.: Rule-Based Information Extraction for Structured Data Acquisition using TextMarker. In: Proc. LWA 2008 (Special Track on Knowledge Discovery and Machine Learning) (2008)
Ireson, N., Ciravegna, F., Califf, M.E., Freitag, D., Kushmerick, N., Lavelli, A.: Evaluating Machine Learning for Information Extraction. In: Raedt, L.D., Wrobel, S. (eds.) ICML. ACM Int. Conf. Proc. Series, vol. 119, pp. 345–352. ACM, New York (2005)
Brickley, D., Guha, R.V.: RDF Vocabulary Description Language 1.0: RDF Schema. W3C recommendation, World Wide Web Consortium (2004)
Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving GATE to meet new challenges in language engineering. JNLE 10(3-4), 349–373 (2004)
Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., Racioppa, S.: Ontology-based Information Extraction and Integration from Heterogeneous Data Sources. Int. Journal of Human-Computer Studies (11), 759–788 (2008)
Endres-Niggemeyer, B., Jauris-Heipke, S., Pinsky, M., Ulbricht, U.: Wissen gewinnen durch Wissen: Ontologiebasierte Informationsextraktion. Information - Wissenschaft & Praxis 57(1), 301–308 (2006)
Embley, D.W., Campbell, D.M., Smith, R.D., Liddle, S.W.: Ontology-based Extraction and Structuring of Information from Data-Rich Unstructured Documents. In: CIKM 1998: Proc. of the 7th Int. Conf. on Information and Knowledge Management, pp. 52–59. ACM, New York (1998)
Sintek, M., Junker, M., van Elst, L., Abecker, A.: Using Information Extraction Rules for Extending Domain Ontologies. In: Workshop on Ontology Learning. CEUR-WS.org (2001)
Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology-based Information Extraction System. In: Szczepaniak, P., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds.) Intelligent Exploration of the Web. Springer, Berlin (2002)
Grishman, R., Sundheim, B.: Design of the MUC-6 evaluation. In: Proc. of a workshop held at Vienna, pp. 413–422. Association for Computational Linguistics, Virginia (1996)
Hobbs, J., Israel, D.: Principles of Template Design. In: HLT 1994: Proc. of the workshop on HLT, pp. 177–181. ACL, Morristown (1994)
Labský, M., Svátek, V., Nekvasil, M., Rak, D.: The Ex Project: Web Information Extraction using Extraction Ontologies. In: Proc. Workshop on Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery, PriCKL 2007 (2007)
Sauermann, L., van Elst, L., Dengel, A.: PIMO - a Framework for Representing Personal Information Models. In: Proc. of I-Semantics 2007, JUCS, pp. 270–277 (2007)
Adrian, B., Dengel, A.: Believing Finite-State cascades in Knowledge-based Information Extraction. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 152–159. Springer, Heidelberg (2008)
Grothkast, A., Adrian, B., Schumacher, K., Dengel, A.: OCAS: Ontology-Based Corpus and Annotation Scheme. In: Proc. of the HLIE Workshop 2008, ECML PKDD, pp. 25–35 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adrian, B., Hees, J., van Elst, L., Dengel, A. (2009). iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text. In: Mertsching, B., Hund, M., Aziz, Z. (eds) KI 2009: Advances in Artificial Intelligence. KI 2009. Lecture Notes in Computer Science(), vol 5803. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04617-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-04617-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04616-2
Online ISBN: 978-3-642-04617-9
eBook Packages: Computer ScienceComputer Science (R0)