Information Extraction for Czech Based on Syntactic Analysis

Baisa, Vít; Kovář, Vojtěch

doi:10.1007/978-3-319-08958-4_13

Information Extraction for Czech Based on Syntactic Analysis

Vít Baisa⁶ &
Vojtěch Kovář⁶

Conference paper
First Online: 01 January 2014

840 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Abstract

We present a complex pipeline of natural language processing tools for Czech that performs extraction of basic facts presented in a text. The input for the tool is a plain text, the output contains verb and noun phrases with basic semantic classification. Automatic syntactic analysis of Czech plays a crucial role in the pipeline. In this paper, we describe the particular tools used in the system, then we give an example of its usage and conclude with a basic evaluation of the overall system accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For a full reference, see http://nlp.fi.muni.cz/projects/ajka/.
2.
For a full reference, see http://nlp.fi.muni.cz/projects/set.
3.
http://nlp.fi.muni.cz/projekty/set/efa/wwwefa.cgi/first_page

References

Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)
Article Google Scholar
Uchimoto, K., Ma, Q., Murata, M., Ozaku, H., Isahara, H.: Named entity extraction based on a maximum entropy model and transformation rules. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 326–335 (2000)
Google Scholar
Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics (2004)
Google Scholar
Abul Seoud, R.A., Youssef, A.B., Kadah, Y.M.: Extraction of protein interaction information from unstructured text using a link grammar parser. In: 2007 International Conference on Computer Engineering and Systems ICCES ’07, Cairo, pp. 70–75 (2007)
Google Scholar
Rychlý, P., Šmerk, P., Pala, K., Sedláček, R.: Morphological analyzer Ajka. Masaryk University, Technical report (2008)
Google Scholar
Šmerk, P.: Unsupervised learning of rules for morphological disambiguation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 211–216. Springer, Heidelberg (2004)
Chapter Google Scholar
Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis as pattern matching: the SET parsing system. In: Proceedings of 4th Language and Technology Conference, Poznań, Poland, Wydawnictwo Poznańskie, pp. 978–983 (2009)
Google Scholar
Pala, K., Smrž, P.: Building Czech WordNet. Rom. J. Inf. Sci. Technol. 7(1–2), 79–88 (2004)
Google Scholar
Pala, K., Rychlý, P., Smrž, P.: DESAM – annotated corpus for Czech. In: Jeffery, K. (ed.) SOFSEM 1997. LNCS, vol. 1338, pp. 523–530. Springer, Heidelberg (1997)
Google Scholar
O’Hara, T., Wiebe, J.: Preposition semantic classification via penn treebank and framenet. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003-Vol. 4, Association for Computational Linguistics, pp. 79–86 (2003)
Google Scholar
Karlík, P., Grepl, M., Nekula, M., Rusínová, Z.: Příruční mluvnice češtiny. Lidové noviny (1995)
Google Scholar
Cunningham, H.: Gate: an architecture for development of robust hlt applications. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 168–175 (2002)
Google Scholar
Miyao, Y., Sagae, K., Sætre, R., Matsuzaki, T., Tsujii, J.: Evaluating contributions of natural language parsers to protein-protein interaction extraction. Bioinformatics 25(3), 394 (2009)
Article Google Scholar
Jakubíček, M., Kovář, V., Grác, M.: Through low-cost annotation to reliable parsing evaluation. In: PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, Sendai, Japan, Tohoku University, pp. 555–562 (2010)
Google Scholar
Harrison, P., Abney, S., Black, E., Flickinger, D., Gdaniec, C., Grishman, R., Hindle, D., Ingria, R., Marcus, M., Santorini, B., Strzalkowski, T.: Evaluating syntax performance of parser/grammars of English. In: Natural Language Processing Systems Evaluation Workshop: Final Technical report RL-TR-91-362, Griffiss Air Force Base, NY, Rome Laboratory, pp. 71–77 (1991)
Google Scholar
Sampson, G.: A proposal for improving the measurement of parse accuracy. Int. J. Corpus Linguist. 5(01), 53–68 (2000)
Article Google Scholar
Sedláček, R., Smrž, P.: A new Czech morphological analyser ajka. In: Matoušek, V., Mautner, P., Mouček, C., Taušer, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 100–107. Springer, Heidelberg (2001)
Chapter Google Scholar
Hlaváčková, D., Horák, A.: Verbalex - new comprehensive lexicon of verb valencies for Czech. In: Proceedings of the Slovko Conference, Bratislava, Slovakia, VEDA (2005).
Google Scholar

Download references

Acknowledgements

This work has been partly supported by the Ministry of the Interior of Czech Republic within the project VF20102014003 and by the Czech Science Foundation under the projects P401/10/0792 and 407/07/0679.

We would like to thank to all our colleagues which participated on developing used tools and data sources.

Author information

Authors and Affiliations

Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Vít Baisa & Vojtěch Kovář

Authors

Vít Baisa
View author publications
You can also search for this author in PubMed Google Scholar
Vojtěch Kovář
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vít Baisa .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
IMMI-CNRS, Orsay, France
Joseph Mariani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baisa, V., Kovář, V. (2014). Information Extraction for Czech Based on Syntactic Analysis. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-08958-4_13
Published: 26 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics