Skip to main content

Did You Say What I Think You Said?

Towards a Language-Based Measurement of a Speech Recognizer’s Confidence

  • Conference paper
Book cover Text, Speech and Dialogue (TSD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7499))

Included in the following conference series:

  • 1648 Accesses

Abstract

In this paper we discuss the problem that in a dialogue system, speech recognizers should be able to guess whether the speech recognition failed, even if no correct transcription of the actual user utterance is available. Only with such a diagnosis available, the dialogue system can choose an adequate repair strategy and try to recover from the interaction problem with the user and avoid negative consequences for the successful completion of the dialogue. We present a data collection for a controlled out-of-vocabulary scenario and discuss an approach to estimate the success of a speech recognizer’s results by exploring differences between the N-gram distribution in the best word chain and in the language model. We present the results of our experiments that indicate that differences can be found to be significant if the speech recognition failed severely. From these results, we derive a quick test for failed recognition that is based on a negative language model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, J.C., Lien, A., Lathrop, B., Hees, H.: Usability evaluation of a volkswagen group in-vehicle speech system. In: Schmidt, A., Dey, A.K., Seder, T., Juhlin, O. (eds.) Automotive UI, pp. 137–144. ACM (2009)

    Google Scholar 

  2. Chelba, C., Jelinek, F.: Structured language modeling. Computer Speeech & Language 14, 283–332 (2000)

    Article  Google Scholar 

  3. Bocchieri, E., Dimitriadis, D.C.D.: Speech recognition modeling advances for mobile voice search. In: Proceedings of Acoustics, Speech and Signal Processing (ICASSP 2011), Prague, pp. 4888–4891 (2011)

    Google Scholar 

  4. Chen, L., Chin, K.K., Knill, K.: Improved language modelling using bag of word pairs. In: Proceedings of Interspeech 2009, Brighton, pp. 2671–2674 (2009)

    Google Scholar 

  5. Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall (2009)

    Google Scholar 

  6. Hacker, M.: Context-aware speech recognition in a robot navigation scenario. In: Proceedings of the 2nd Workshop on Context Aware Intelligent Assistance, pp. 4–15 (2012)

    Google Scholar 

  7. Katz, S.M.: Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing 35, 400–401 (1987)

    Article  Google Scholar 

  8. Chelba, C., Brants, T., Neveitt, W., Xu, P.: Study on interaction between entropy pruning and kneser-ney smoothing. In: Proceedings of Interspeech 2010, pp. 2242–2245 (2010)

    Google Scholar 

  9. Uhrik, C., Ward, W.: Confidence Metrics Based on N-Gram Language Model Backoff Behaviors. In: Fifth European Conference on Speech Communication and Technology. ISCA (1997)

    Google Scholar 

  10. Jiang, H.: Confidence measures for speech recognition: A survey. Speech Communication 45, 455–470 (2005)

    Article  Google Scholar 

  11. Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recogniser. IEEE Transactions on Acoustics, Speech, and Signal Processing 35, 400–401 (1987)

    Article  Google Scholar 

  12. Spiegl, W., Riedhammer, K., Steidl, S., Nöth, E.: FAU IISAH Corpus – A German Speech Database Consisting of Human-Machine and Human-Human Interaction Acquired by Close-Talking and Far-Distance Microphones. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), pp. 2420–2423. ELRA (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ludwig, B., Hitzenberger, L. (2012). Did You Say What I Think You Said?. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32790-2_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32789-6

  • Online ISBN: 978-3-642-32790-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics