Abstract
The paper presents a corpus of Polish spoken dialogues being a result of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting the corpus and its annotation on several levels, from transcription of dialogues and their morphosyntactic analysis, to semantic annotation on concepts and predicates. Annotation on the morphosyntactic and semantic levels was done automatically and then manually corrected. At the concept level, the annotation scheme comprises about 200 concepts from an ontology designed specially for the project. The set of frames for predicate level annotation was defined as a FrameNet-like resource.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bonneau-Maynard, H., et al.: Semantic Annotation of the MEDIA Corpus for Spoken Dialog. In: ISCA Interspeech, Lisbon, pp. 3457–3460 (2005)
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: LREC 1998, pp. 1373–1376 (1998)
Leech, G., Wilson, A.: EAGLES. Recommendations for the Morphosyntactic Annotation of Corpora, EAG-TCWG-MAC/R. Technical report, ILC-CNR, Pisa (1996)
Hajnicz, E., Kupść, A.: Przeglad analizatorów morfologicznych dla jêzyka polskiego. Raport IPI PAN, Warszawa (2001)
Rabiega-Wiśniewska, J., Rudolf, M.: Towards a Bi-Modular Automatic Analyzer of Large Polish Corpora. In: Kosta, R., Blaszczak, J., Frasek, J., Geist, L., Żygis, M. (eds.) Investigations into Formal Slavic Linguistics. Contributions of the Fourth European Conference on Formal Description of Slavic Languages – FDSL IV, pp. 363–372. Peter Lang (2003)
Mykowiecka, A., Marasek, K., Marciniak, M., Rabiega-Winiewska, J., Gubrynowicz, R.: On Construction of Polish Spoken Dialogs Corpus. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008). 2nd Linguistic Annotation Workshop (LAW II), Marrakech, Morocco, pp. 52–55 (2008)
Marciniak, M., Rabiega-Winiewska, J., Mykowiecka, A.: Proper Names in Dialogs from the Warsaw Transportation Call Center. In: Intelligent Information Systems XVI 2008. EXIT (2008)
Paslaru-Bontas, E.: A Contextual Approach to Ontology Reuse Methodology, Methods and Tools for the Semantic Web. PhD thesis, Fachbereich Mathematik u. Informatik, Freie Universität Berlin (2007)
Mykowiecka, A., Marciniak, M., Głowiñska, K.: Semantic Annotation of Polish Dialogue Corpus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 625–632. Springer, Heidelberg (2008)
Jackendoff, R.: Semantic structures. The MIT Press, Cambridge (1990)
Fillmore, C.J.: Frame Semantics. In: Linguistics in the Morning Calm, Seoul, pp. 111–137. Hanshin Publishing Co (1982)
Lowe, J.B., Baker, C.F., Fillmore, C.J.: A Frame-Semantic Approach to Semantic Annotation. In: Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, April 4-5, Washington, D.C., USA in conjunction with ANLP-1997 (1997)
Fillmore, C.R., Johnson, C.J., Petruck, M.R.: Background to Framenet. International Journal of Lexicography 16.3, 235–250 (2003)
Meurs, M.J., Duvert, F., Bechet, F., Lefevre, F., De Mori, R.: Semantic Frame Annotation on the French MEDIA corpus. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Pisarkowa, K.: Składnia rozmowy telefonicznej. Wydawnictwo PAN (1975)
Heuvel, H., et al: SpeechDat-E: Five Eastern European Speech Databases for Voice-Operated Teleservices Ccompleted. In: Dalsgaard, P. (ed.) Eurospeech 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, Aalborg, Denmark (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mykowiecka, A., Marasek, K., Marciniak, M., Rabiega-Wiśniewska, J., Gubrynowicz, R. (2009). Annotated Corpus of Polish Spoken Dialogues. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)