Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Included in the following conference series:

Abstract

The paper presents a corpus of Polish spoken dialogues being a result of the LUNA (spoken Language UNderstanding in multilinguAl communication systems) project. We describe the process of collecting the corpus and its annotation on several levels, from transcription of dialogues and their morphosyntactic analysis, to semantic annotation on concepts and predicates. Annotation on the morphosyntactic and semantic levels was done automatically and then manually corrected. At the concept level, the annotation scheme comprises about 200 concepts from an ontology designed specially for the project. The set of frames for predicate level annotation was defined as a FrameNet-like resource.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bonneau-Maynard, H., et al.: Semantic Annotation of the MEDIA Corpus for Spoken Dialog. In: ISCA Interspeech, Lisbon, pp. 3457–3460 (2005)

    Google Scholar 

  2. Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a Free Tool for Segmenting, Labeling and Transcribing Speech. In: LREC 1998, pp. 1373–1376 (1998)

    Google Scholar 

  3. Leech, G., Wilson, A.: EAGLES. Recommendations for the Morphosyntactic Annotation of Corpora, EAG-TCWG-MAC/R. Technical report, ILC-CNR, Pisa (1996)

    Google Scholar 

  4. Hajnicz, E., Kupść, A.: Przeglad analizatorów morfologicznych dla jêzyka polskiego. Raport IPI PAN, Warszawa (2001)

    Google Scholar 

  5. Rabiega-Wiśniewska, J., Rudolf, M.: Towards a Bi-Modular Automatic Analyzer of Large Polish Corpora. In: Kosta, R., Blaszczak, J., Frasek, J., Geist, L., Żygis, M. (eds.) Investigations into Formal Slavic Linguistics. Contributions of the Fourth European Conference on Formal Description of Slavic Languages – FDSL IV, pp. 363–372. Peter Lang (2003)

    Google Scholar 

  6. Mykowiecka, A., Marasek, K., Marciniak, M., Rabiega-Winiewska, J., Gubrynowicz, R.: On Construction of Polish Spoken Dialogs Corpus. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008). 2nd Linguistic Annotation Workshop (LAW II), Marrakech, Morocco, pp. 52–55 (2008)

    Google Scholar 

  7. Marciniak, M., Rabiega-Winiewska, J., Mykowiecka, A.: Proper Names in Dialogs from the Warsaw Transportation Call Center. In: Intelligent Information Systems XVI 2008. EXIT (2008)

    Google Scholar 

  8. Paslaru-Bontas, E.: A Contextual Approach to Ontology Reuse Methodology, Methods and Tools for the Semantic Web. PhD thesis, Fachbereich Mathematik u. Informatik, Freie Universität Berlin (2007)

    Google Scholar 

  9. Mykowiecka, A., Marciniak, M., Głowiñska, K.: Semantic Annotation of Polish Dialogue Corpus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 625–632. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Jackendoff, R.: Semantic structures. The MIT Press, Cambridge (1990)

    Google Scholar 

  11. Fillmore, C.J.: Frame Semantics. In: Linguistics in the Morning Calm, Seoul, pp. 111–137. Hanshin Publishing Co (1982)

    Google Scholar 

  12. Lowe, J.B., Baker, C.F., Fillmore, C.J.: A Frame-Semantic Approach to Semantic Annotation. In: Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, April 4-5, Washington, D.C., USA in conjunction with ANLP-1997 (1997)

    Google Scholar 

  13. Fillmore, C.R., Johnson, C.J., Petruck, M.R.: Background to Framenet. International Journal of Lexicography 16.3, 235–250 (2003)

    Article  Google Scholar 

  14. Meurs, M.J., Duvert, F., Bechet, F., Lefevre, F., De Mori, R.: Semantic Frame Annotation on the French MEDIA corpus. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  15. Pisarkowa, K.: Składnia rozmowy telefonicznej. Wydawnictwo PAN (1975)

    Google Scholar 

  16. Heuvel, H., et al: SpeechDat-E: Five Eastern European Speech Databases for Voice-Operated Teleservices Ccompleted. In: Dalsgaard, P. (ed.) Eurospeech 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, Aalborg, Denmark (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mykowiecka, A., Marasek, K., Marciniak, M., Rabiega-Wiśniewska, J., Gubrynowicz, R. (2009). Annotated Corpus of Polish Spoken Dialogues. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04235-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04234-8

  • Online ISBN: 978-3-642-04235-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics