Skip to main content

Where Do Parsing Errors Come From

The Case of Spoken Estonian

  • Conference paper
Text, Speech and Dialogue (TSD 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

Abstract

This paper discusses some issues of developing a parser for spoken Estonian which is based on an already existing parser for written language, and employs the Constraint Grammar framework.

When we used a corpus of face-to-face everyday conversations as the training and testing material, the parser gained the recall 97.6% and the precision 91.8%. The parsing of institutional phone calls turned out to be a more complicated task, with the recall dropping by 3%. In this paper, we will focus on parsing nonfluent speech using a rule-based parser. We will give an overview of parsing errors and ways to overcome them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hennoste, T., Lindström, L., Rääbis, A., Toomet, P., Vellerind, R.: Tartu University Corpus of Spoken Estonian. In: Seilenthal, T., Nurk, A., Palo, T. (eds.) Congressus Nonus Internationalis Fenno-Ugristarum. Pars IV. Dissertationes sectionum: Linguistica I, Tartu, pp. 345–351 (2000)

    Google Scholar 

  2. Müürisep, K., Puolakainen, T., Muischnek, K., Koit, M., Roosmaa, T., Uibo, H.: A New Language for Constraint Grammar: Estonian. In: Proc. of Conference Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 304–310 (2003)

    Google Scholar 

  3. Karlsson, F., Anttila, A., Heikkilä, J., Voutilainen, A.: Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)

    Google Scholar 

  4. Müürisep, K., Uibo, H.: Shallow Parsing of Spoken Estonian Using Constraint Grammar. In: Henrichsen, P.J., Skadhauge, P.R. (eds.) Treebanking for Discourse and Speech. Proc. of NODALIDA 2005 Special Session. Copenhagen Studies in Language 32, pp. 105–118. Samfundslitteratur (2006)

    Google Scholar 

  5. Müürisep, K., Nigol, H.: Disfluency Detection and Parsing of Transcribed Speech of Estonian. In: Vetulani, Z. (ed.) Proc.of Human Language Technologies as a Challenge for Computer Science and Linguistics. 3rd Language & Technology Conference, Poznan, Poland, pp. 483–487. Fundacja Uniwersitetu im. A. Mickiewicza (2007)

    Google Scholar 

  6. Nigol, H.: Parsing Manually Detected and Normalized Disfluencies in Spoken Estonian. In: Proc. of NODALIDA 2007, Tartu (2007)

    Google Scholar 

  7. Charniak, E., Johnson, M.: Edit detection and parsing for transcribed speech. In: Proc. of NAACL 2001, pp. 118–126 (2001)

    Google Scholar 

  8. Lease, M., Johnson, M.: Early deletion of fillers in processing conversational speech. In: Proc. HLT-NAACL 2006, companion volume: short papers, pp. 73–76 (2006)

    Google Scholar 

  9. Core, M.G., Schubert, L.K.: A Syntactic Framework for Speech Repairs and Other Disruptions. In: Proc. of 37th Ann. Meet. of the ACL, pp. 413–420 (1999)

    Google Scholar 

  10. Johannessen, J.B., Jørgensen, F.: Annotating and Parsing Spoken Language. In: Henrichsen, P.J., Skadhauge, P.R. (eds.) Treebanking for Discourse and Speech. Proc. of NODALIDA 2005 Special Session. Copenhagen Studies in Language 32, pp. 83–103. Samfundslitteratu (2006)

    Google Scholar 

  11. Heeman, P., Allen, J.: Tagging Speech Repairs. ARPA Workshop on Human Language Technolog, pp. 187–192 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Müürisep, K., Nigol, H. (2008). Where Do Parsing Errors Come From. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87391-4_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87390-7

  • Online ISBN: 978-3-540-87391-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics