Abstract
In this paper we examine the quality of the prediction of intelligibility scores of human experts. Furthermore, we investigate the differences between subjective expert raters who evaluated speech disorders of laryngectomees and children with cleft lip and palate. We use the recognition rate of a word recognizer and prosodic features to predict the intelligibility score of each individual expert. For each expert and the mean opinion of all experts we present the best features to model their scoring behavior according to the mean rank obtained during a 10-fold cross-validation. In this manner all individual speech experts were modeled with a correlation coefficient of at least r > .75. The mean opinion of all raters is predicted with a correlation of r =.90 for the laryngectomees and r =.86 for the children.
This work was supported by the Johannes-und-Frieda-Marohn Stiftung and the Deutsche Forschungsgemeinschaft (German Research Foundation) under grant SCHU2320/1-1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Haderlein, T.: Nöth, E., Schuster, M., Eysholdt, U., Rosanowski, F.: Evaluation of Tracheoesophageal Substitute Voices Using Prosodic Features. In: Hoffmann, R., Mixdorff, H. (eds.) Proc. Speech Prosody, 3rd International Conference, Dresden, Germany, TUDpress, pp. 701–704 (2006)
Harding, A., Grunwell, P.: Active versus passive cleft-type speech characteristics. Int. J. Lang. Commun. Disord. 33(3), 329–352 (1998)
Fox, A.: PLAKSS - Psycholinguistische Analyse kindlicher Sprechstörungen. Swets & Zeitlinger, Frankfurt a.M., now available from Harcourt Test Services GmbH, Germany (2002)
Schukat-Talamazzini, E., Niemann, H., Eckert, W., Kuhn, T., Rieck, S.: Automatic Speech Recognition without Phonemes. In: Proceedings European Conference on Speech Communication and Technology (Eurospeech), Berlin, Germany, pp. 129–132 (1993)
Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin (2000)
Stemmer, G.: Modeling Variability in Speech Recognition. PhD thesis, Chair for Pattern Recognition, University of Erlangen-Nuremberg, Germany (2005)
Gales, M., Pye, D., Woodland, P.: Variance compensation within the MLLR framework for robust speech recognition and speaker adaptation. In: Proc. ICSLP 1996. Philadelphia, USA, vol. 3 pp. 1832–1835 (1996)
Maier, A., Hacker, C., Nöth, E., Nkenke, E., Haderlein, T., Rosanowski, F., Schuster, M.: Intelligibility of children with cleft lip and palate: Evaluation by speech recognition techniques. In: Proc. International Conf. on Pattern Recognition. Hong Kong, China, vol. 4, pp. 274–277 (2006)
Schuster, M., Maier, A., Haderlein, T., Nkenke, E., Wohlleben, U., Rosanowski, F., Eysholdt, U., Nöth, E.: Evaluation of Speech Intelligibility for Children with Cleft Lip and Palate by Automatic Speech Recognition. Int. J. Pediatr. Otorhinolaryngol. 70, 1741–1747 (2006)
Kießling, A.: Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker, Aachen (1997)
Nöth, E., Batliner, A., Kießling, A., Kompe, R., Niemann, H.: Verbmobil: The Use of Prosody in the Linguistic Components of a Speech Understanding System. IEEE Trans. on Speech and Audio Processing 8, 519–532 (2000)
Batliner, A., Buckow, A., Niemann, H., Nöth, E., Warnke, V.: The Prosody Module. [5], pp. 106–121
Smola, A., Schölkopf, B.: A tutorial on support vector regression. In: NeuroCOLT2 Technical Report Series, NC2-TR-1998-030 (1998)
Cohen, J., Cohen, P.: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale, New Jersey (1983)
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand (1998)
Liu, H., Setiono, R.: A probabilistic approach to feature selection - a filter solution. In: 13th International Conference on Machine Learning, pp. 319–327 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maier, A., Haderlein, T., Schuster, M., Nkenke, E., Nöth, E. (2007). Intelligibility Is More Than a Single Word: Quantification of Speech Intelligibility by ASR and Prosody. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)