Summary
Bottleneck for semantic web services is lack of semantically annotated information. We deal with linguistic information extraction from Czech texts from the Web for semantic annotation. The method described in the paper exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0). We propose a system which captures text of web-pages, annotates it linguistically by PDT tools, extracts data and stores the data in an ontology. We focus on the third phase – data extraction – and present methods for learning queries over linguistically annotated data. Our experiments in the domain of reports of traffic accidents enable e.g. summarization of the number of injured people. This serves as a proof of concept of our solution. More experiments, for different queries and different domain are planned in the future. This will improve third party semantic annotation of web resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Collins, M., Hajič, J., Brill, E., Ramshaw, L., Tillmann, C.: A Statistical Parser of Czech. In: Proceedings of 37th ACL Conference, University of Maryland, College Park, USA, pp. 505–512 (1999)
Hajič, J., Hajičová, E., Hlaváčová, J., Klimeš, V., Mírovský, J., Pajas, P., Štěpánek, J., Vidová-Hladká, B., Žabokrtský, Z.: Prague dependency treebank 2.0 cd-rom. Linguistic Data Consortium LDC2006T01, Philadelphia 2006 (2006)
Klimeš, V.: Transformation-based tectogrammatical analysis of czech. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 135–142. Springer, Heidelberg (2006)
Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolářová, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical level in the prague dependency treebank. annotation manual. Technical Report 30, ÚFAL MFF UK, Prague, Czech Rep. (2006)
Mírovský, J.: Netgraph: A tool for searching in prague dependency treebank 2.0. In: Hajič, J., Nivre, J. (eds.) Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT), Prague, Czech rep., vol. 5, pp. 211–222 (2006)
SWSI. Semantic web services initiative
W3C. Web services activity statement (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dědek, J., Vojtáš, P. (2008). Linguistic Extraction for Semantic Annotation. In: Badica, C., Mangioni, G., Carchiolo, V., Burdescu, D.D. (eds) Intelligent Distributed Computing, Systems and Applications. Studies in Computational Intelligence, vol 162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85257-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-85257-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85256-8
Online ISBN: 978-3-540-85257-5
eBook Packages: EngineeringEngineering (R0)