Annotated Corpus of Polish Spoken Dialogues

Mykowiecka, Agnieszka; Marasek, Krzysztof; Marciniak, Małgorzata; Rabiega-Wiśniewska, Joanna; Gubrynowicz, Ryszard

doi:10.1007/978-3-642-04235-5_5

Agnieszka Mykowiecka^21,22,
Krzysztof Marasek²²,
Małgorzata Marciniak²¹,
Joanna Rabiega-Wiśniewska²¹ &
…
Ryszard Gubrynowicz²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Included in the following conference series:

Language and Technology Conference

685 Accesses
4 Citations

Abstract

The paper presents a corpus of Polish spoken dialogues being a result of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting the corpus and its annotation on several levels, from transcription of dialogues and their morphosyntactic analysis, to semantic annotation on concepts and predicates. Annotation on the morphosyntactic and semantic levels was done automatically and then manually corrected. At the concept level, the annotation scheme comprises about 200 concepts from an ontology designed specially for the project. The set of frames for predicate level annotation was defined as a FrameNet-like resource.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Speech Acts Annotation of Everyday Conversations in the ORD Сorpus of Spoken Russian

PDTSC 2.0 - Spoken Corpus with Rich Multi-layer Structural Annotation

Spanish Treebank Annotation of Informal Non-standard Web Text

References

Bonneau-Maynard, H., et al.: Semantic Annotation of the MEDIA Corpus for Spoken Dialog. In: ISCA Interspeech, Lisbon, pp. 3457–3460 (2005)
Google Scholar
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: LREC 1998, pp. 1373–1376 (1998)
Google Scholar
Leech, G., Wilson, A.: EAGLES. Recommendations for the Morphosyntactic Annotation of Corpora, EAG-TCWG-MAC/R. Technical report, ILC-CNR, Pisa (1996)
Google Scholar
Hajnicz, E., Kupść, A.: Przeglad analizatorów morfologicznych dla jêzyka polskiego. Raport IPI PAN, Warszawa (2001)
Google Scholar
Rabiega-Wiśniewska, J., Rudolf, M.: Towards a Bi-Modular Automatic Analyzer of Large Polish Corpora. In: Kosta, R., Blaszczak, J., Frasek, J., Geist, L., Żygis, M. (eds.) Investigations into Formal Slavic Linguistics. Contributions of the Fourth European Conference on Formal Description of Slavic Languages – FDSL IV, pp. 363–372. Peter Lang (2003)
Google Scholar
Mykowiecka, A., Marasek, K., Marciniak, M., Rabiega-Winiewska, J., Gubrynowicz, R.: On Construction of Polish Spoken Dialogs Corpus. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008). 2nd Linguistic Annotation Workshop (LAW II), Marrakech, Morocco, pp. 52–55 (2008)
Google Scholar
Marciniak, M., Rabiega-Winiewska, J., Mykowiecka, A.: Proper Names in Dialogs from the Warsaw Transportation Call Center. In: Intelligent Information Systems XVI 2008. EXIT (2008)
Google Scholar
Paslaru-Bontas, E.: A Contextual Approach to Ontology Reuse Methodology, Methods and Tools for the Semantic Web. PhD thesis, Fachbereich Mathematik u. Informatik, Freie Universität Berlin (2007)
Google Scholar
Mykowiecka, A., Marciniak, M., Głowiñska, K.: Semantic Annotation of Polish Dialogue Corpus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 625–632. Springer, Heidelberg (2008)
Chapter Google Scholar
Jackendoff, R.: Semantic structures. The MIT Press, Cambridge (1990)
Google Scholar
Fillmore, C.J.: Frame Semantics. In: Linguistics in the Morning Calm, Seoul, pp. 111–137. Hanshin Publishing Co (1982)
Google Scholar
Lowe, J.B., Baker, C.F., Fillmore, C.J.: A Frame-Semantic Approach to Semantic Annotation. In: Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, April 4-5, Washington, D.C., USA in conjunction with ANLP-1997 (1997)
Google Scholar
Fillmore, C.R., Johnson, C.J., Petruck, M.R.: Background to Framenet. International Journal of Lexicography 16.3, 235–250 (2003)
Article Google Scholar
Meurs, M.J., Duvert, F., Bechet, F., Lefevre, F., De Mori, R.: Semantic Frame Annotation on the French MEDIA corpus. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Google Scholar
Pisarkowa, K.: Składnia rozmowy telefonicznej. Wydawnictwo PAN (1975)
Google Scholar
Heuvel, H., et al: SpeechDat-E: Five Eastern European Speech Databases for Voice-Operated Teleservices Ccompleted. In: Dalsgaard, P. (ed.) Eurospeech 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, Aalborg, Denmark (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, J.K. Ordona 21, 01-237, Warsaw, Poland
Agnieszka Mykowiecka, Małgorzata Marciniak & Joanna Rabiega-Wiśniewska
Polish Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Agnieszka Mykowiecka, Krzysztof Marasek & Ryszard Gubrynowicz

Authors

Agnieszka Mykowiecka
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Marasek
View author publications
You can also search for this author in PubMed Google Scholar
Małgorzata Marciniak
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Rabiega-Wiśniewska
View author publications
You can also search for this author in PubMed Google Scholar
Ryszard Gubrynowicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznań, ul. Umultowska 87, P.O. Box, 61614, Poznań, Poland
Zygmunt Vetulani
Language Technology Lab, German Research Center for Artificial Intelligence (DFKI), Campus D 3 1, Stuhlsatzenhausweg 3, D-66123, Saarbrücken, Germany
Hans Uszkoreit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mykowiecka, A., Marasek, K., Marciniak, M., Rabiega-Wiśniewska, J., Gubrynowicz, R. (2009). Annotated Corpus of Polish Spoken Dialogues. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-04235-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics