Skip to main content

Some Special Problems of Speech Communication

  • Conference paper
  • 1733 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Abstract

We start with a brief overview of our work in speech recognition and understanding which led from monomodal (speech only) human-machine dialog to multimodal human-machine interaction and assistance. Our work in speech communication initially had the goal to develop a complete system for question answering by spoken dialog [7,15]. This goal was achieved in various projects funded by the German Research Foundation [14] and the German Federal Ministry of Education and Research [16]. Problems of multilingual communication were considered in projects supported by the European Union [2,4,10]. In the Verbmobil project the speech-to-speech translation problem was investigated and it turned out that prosody and the recognition of emotion was important and extremely useful – if not indispensible – to disambiguate utterances and to influence the dialog strategy [3,17]. Multimodal and multimedia aspects of human-machine communication became a topic in the follow-up projects Embassi [11], SmartKom [1], FORSIP [12], and SmartWeb [9].

The SmartWeb project [19], which involves 17 partners from companies, research institutes, and universities, has the general goal to provide the foundations for multimodal human-machine communication with distributed semantic web services using different mobile devices, hand-held, mounted in a car or to a motor cycle. It uses speech and video signals as well as signals from other sensors, e.g. ECG or skin resistance. A special problem in human-machine interaction and assistance is the question whether the user speaks to the machine or not, that is, the distinction of on- and off-talk. It is shown how on-/off-talk can be classified by the combination of prosodic and image features. Using additional sensors the user state in general is estimated to give further cues to the dialog control. This may be used, for example, to avoid input from the dialog system in a situation where a driver is under stress.

In other projects the special problem of children’s speech processing was considered [20]. Among others it was investigated whether a manual correction of automatically computed fundamental frequency F 0 and word boundaries might have a positive effect on the automatic classification of the 4 classes anger, motherese, emphatic, and neutral; this was not the case, leading to the conclusion that presently there is no need for improved F 0 algorithms in emotion recognition. The word accuracy (WA) of native and non-native English speaking children was investigated; it was shown that non-native speakers (age 10 – 15) achieve about the same WA as children aged 6 – 7 using a speech recognizer trained with native children speech. The recognizer also was used to develop an automatic scoring of the pronunciation quality of children learning English.

A special problem are impairments of speech which may be congenital (e.g. the cleft lip and palate) or acquired by disease (e.g. cancer of the larynx). Impairments are, among others, treated with speech training by speech therapists. They score the speech quality subjectively according to various criteria. The idea is that the WA of an automatic speech recognizer should be highly correlated with the human rating. Using speech samples from laryngectomees it is shown that the machine rating is about as good as the rating of five human experts and can also be done via telephone. This opens the possibility of an objective and standardized rating of speech quality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Adelhardt, J., Shi, R., Frank, C., Zeißler, V., Batliner, A., Nöth, E., Niemann, H.: Multimodal User State Recognition in a Modern Dialogue System. In: Günter, A., Kruse, R., Neumann, B. (eds.) KI 2003. LNCS (LNAI), vol. 2821, pp. 591–605. Springer, Heidelberg (2003)

    Google Scholar 

  2. Aretoulaki, M., Harbeck, S., Gallwitz, F., Nöth, E., Niemann, H., Ivanecki, J., Ipsic, I., Pavešić, N., Matoušek, V.: SQEL: A Multilingual and Multifunctional Dialog System. In: Mannell, R.H., Robert-Ribes, J. (eds.) Proc. International Conference on Spoken Language Processing (ICSLP), Sydney, Australia, pp. 855–858 (1998)

    Google Scholar 

  3. Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., Fischer, K.: The Recognition of Emotion. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, pp. 122–130. Springer, Heidelberg (2000)

    Google Scholar 

  4. Gallwitz, F., Nöth, E., Niemann, H.: Recognition of Out-of-Vocabulary Words and their Semantic Category. In: Matoušek, V., Niemann, H. (eds.) Proc. of the 2nd SQEL Workshop on Multi-Lingual Information Retrieval Dialogs, University of West Bohemia, Pilsen, pp. 114–121 (1997)

    Google Scholar 

  5. Hacker, C., Cincarek, T., Maier, A., Heßler, A., Nöth, E.: Boosting of Prosodic and Pronunciation Features to Detect Mispronunciations of Non-Native Children. In: Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP). Honolulu, Hawaii, vol. 4, pp. 197–200 (2007)

    Google Scholar 

  6. Haderlein, T., Nöth, E., Schuster, M., Eysholdt, U., Rosanowski, F.: Evaluation of Tracheoesophageal Substitute Voices Using Prosodic Features. In: Hoffmann, R., Mixdorff, H. (eds.) Proc. Speech Prosody, 3rd International Conference, Dresden, pp. 701–704 (2006)

    Google Scholar 

  7. Hein, H.W., Niemann, H.: Expert knowledge for automatic understanding of continuous speech. In: Proc. First European Signal Processing Conference (EUSIPCO), Lausanne, Switzerland, pp. 647–651. North Holland Publ. Comp., Amsterdam (1980)

    Google Scholar 

  8. Hönig, F., Batliner, A., Nöth, E.: Fast Recursive Data-Driven Multi-Resolution Feature Extraction for Physiological Signal Classification. In: 3rd Russian-Bavarian Conference on Biomedical Engineering, Erlangen (to appear, 2007)

    Google Scholar 

  9. Horndasch, A., Nöth, E., Batliner, A., Warnke, V.: Phoneme-to-Grapheme Mapping for Spoken Inquiries to the Semantic Web. In: Proc. Interspeech, 2006 ICSLP, 10th International Conference on Spoken Language Processing. Pittsburgh, pp. 13–16 (2006)

    Google Scholar 

  10. Kuhn, T., Niemann, H., Schukat-Talamazzini, E.G., Eckert, W., Rieck, S.: Context-Dependent Modeling in a two Stage HMM Word Recognizer for Continuous Speech. In: Vandewalle, J., Boite, R., Moonen, M., Oosterlinck, A. (eds.) Signal Processing IV, Theories and Applications, Proc. EUSIPCO-92, pp. 439–442. Elsevier, Amsterdam (1992)

    Google Scholar 

  11. Ludwig, B., Niemann, H., Klarner, M., Görz, G.: Content and Context in Dialogue Systems. In: Wilks, Y. (ed.) Proc. of the 3rd Bellagio Workshop on Human-Computer Conversation, Sheffield, pp. 105–111 (2000)

    Google Scholar 

  12. Ludwig, B., Klarner, M., Reiß, P., Görz, G., Niemann, H.: A Pragmatics First Approach to Analysis and Generation of Discourse Relations. In: Buchberger, E. (ed.) 7. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS), Vienna, pp. 117–124 (2004)

    Google Scholar 

  13. Maier, A., Nöth, E., Nkenke, E., Schuster, S.: Automatic Assessment of Children’s Speech with Cleft Lip and Palate. In: Fifth Slovenian and First International Language Technologies Conference. Ljubljana, pp. 31–35 (2006)

    Google Scholar 

  14. Mast, M., Kummert, F., Ehrlich, U., Fink, G., Kuhn, T., Niemann, H., Sagerer, G.: A Speech Understanding and Dialog System with a Homogeneous Linguistic Knowledge Base. IEEE Trans. on Pattern Analysis and Machine Intelligence 16, 179–194 (1994)

    Article  Google Scholar 

  15. Niemann, H., Brietzmann, A., Mühlfeld, R., Regel, P., Schukat, G.: The Speech Understanding and Dialog System EVAR. In: De Mori, R., Suen, C.Y. (eds.) New Systems and Architectures for Automatic Speech Recognition and Synthesis. NATO ASI Series F16, pp. 271–302. Springer, Heidelberg (1985)

    Google Scholar 

  16. Niemann, H., Sagerer, G., Ehrlich, U., Schukat-Talamazzini, E.G., Kummert, F.: The Interaction of Word Recognition and Linguistic Processing in Speech Understanding. In: Laface, P., De Mori, R. (eds.) Speech Recognition and Understanding, Recent Advances, Trends and Applications. NATO ASI Series F75, pp. 425–453. Springer, Heidelberg (1992)

    Google Scholar 

  17. Nöth, E., Batliner, A., Kießling, A., Kompe, R., Niemann, H.: VERBMOBIL: The Use of Prosody in the Linguistic Components of a Speech Understanding System. IEEE Trans. on Speech and Audio Processing 8, 519–532 (2000)

    Article  Google Scholar 

  18. Nöth, E., Hacker, C., Batliner, A.: Does Multimodality Really Help? The Classification of Emotion and of On/Off-Focus in Multimodal Dialogues – Two Case Studies. In: Proc. 49th International Symposium ELMAR-2007, Zadar, Croatia (to appear, 2007)

    Google Scholar 

  19. Reithinger, N., Herzog, G., Blocher, A.: SmartWeb – Mobile Broadband Access to the Semantic Web. Künstliche Intelligenz, 2, 30–33 (2007)

    Google Scholar 

  20. Steidl, S., Stemmer, G., Hacker, C., Nöth, E., Niemann, H.: Improving Children’s Speech Recognition by HMM Interpolation with an Adult’s Speech Recognizer. In: Michaelis, B., Krell, G. (eds.) Pattern Recognition. LNCS, vol. 2781, pp. 600–607. Springer, Heidelberg (2003)

    Google Scholar 

  21. Steidl, S., Levit, M., Batliner, A., Nöth, E., Niemann, H.: Of All Things the Measure is Man. – Classification of Emotions and Inter-Labeler Consistency. In: Proceedings of ICASSP 2005 - International Conference on Acoustics, Speech, and Signal Processing. Philadelphia, pp. 317–320 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Niemann, H. (2007). Some Special Problems of Speech Communication. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74628-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74627-0

  • Online ISBN: 978-3-540-74628-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics