Abstract
This paper presents the results of research of perception of synthesized and natural speech, and investigates the role of the prosodic characteristic of pauses in the process of speech comprehension. The research involved a series of perception tasks, including quality assessment, an intelligibility task and comprehension tests of ten shorter and one longer text in Serbian produced by the AlfaNum speech synthesizer and a professional actor, and a follow-up comprehension task of synthesized speech with modified pauses. The results of the intelligibility task show similar performance by both groups of subjects, while the comprehensibility tasks indicate better performance for natural than for synthesized speech. The results of the follow-up task show that the modified prosody contributed to the better performance of the subjects. The quality assessment task revealed the subjects preference for natural speech mainly on the basis of the prosodic characteristic of pauses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Pauses in the form of silence are not the only significant indicators of IP boundaries; the other common cues of IP boundaries are the lengthening of final segments (pre-boundary lengthening) and the presence of a specific boundary tone.
- 2.
The results will be reported in detail in Sect. 3.
- 3.
SUS is the methodology proposed as the most appropriate for assessing segmental intelligibility in [6] and references therein.
References
Pisoni, D.B.: Perception of synthetic speech. In: van Santen, J.P.H., Sproat, R.W., Olive, J.P., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 541–560. Springer, New York (1997)
Pisoni, D.B.: Some measures of intelligibility and comprehension. In: Allen, J., Hunnicutt, M.S., Klatt, D.H. (eds.) From Text to Speech: The MITalk System, pp. 151–171. Cambridge University Press, Cambridge, UK (1987)
Pisoni, D.B.: Speeded classification of natural and synthetic speech in a lexical decision task. J. Acoust. Soc. Am. 70, S98 (1981)
Pisoni, D.B., Nusbaum, H., Greene, B.G.: Perception of synthetic speech generated by rule. In: Proceedings of the IEEE, pp. 1665–1676 (1985)
Pols, L.C.W., Santen, J.P.H. van, Abe, M., Kahn, D., Keller, E.: The use of large text corpora for evaluation text-to-speech systems. In: Proceedings of the First International Conference on Language Resources and Evaluation, pp. 637–640. Granada, Spain (1998)
Chang, Y.Y.: Evaluation of TTS systems in intelligibility and comprehension tasks. In: ROCLING, Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing, Taipei, Taiwan, pp. 64–78 (2011)
Warren, R.M.: Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970)
Warren, R.M., Obusek, C.: Speech perception and phonemic restorations. Percept. Psychophys. 9, 358–363 (1971)
Selkirk, E.: Phonology and Syntax: The Relation Between Sound and Structure. MIT Press, Cambridge (1984)
Kjelgaard, M.M., Speer, S.R.: Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. J. Mem. Lang. 40, 153–194 (1999)
Swerts, M., Geluykens, R.: Prosody as a marker of information flow in spoken discourse. Lang. Speech 37, 21–45 (1994)
Hirschberg, J.: Communication and prosody: functional aspects of prosody. Speech Commun. (Special Issue on Dialogue and Prosody) 36, 31–43 (2001)
Shriberg, E., Stolcke, A., Hakkani-Tür, D., Tür, G.: Prosody-Based Automatic Segmentation of Speech into Sentences and Topics. Speech Commun. 32, 127–154 (2000)
Cutler, A., Dahan, D., van Donselaar, W.: Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997)
Swerts, M., Geluykens, R.: Local and global prosodic cues to discourse organization in dialogues. In: Proceedings of the ESCA Workshop on Prosody, pp. 108–111. Lund, Sweden (1993)
Tench, P.: The Intonation System of English. Cassell, London (1996)
Jonides, J., Lewis, R.L., Nee, D.E., Lustig, C.A., Berman, M.G., Moore, K.S.: The mind and brain of short-term memory. Annu. Rev. Psychol. 59, 193–224 (2008)
Acknowledgments
The presented study was financed by the Ministry of Education and Science of the Republic of Serbia under the Research grant TR32035.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Marković, M., Jakovljević, B., Milićev, T., Miliević, N. (2015). The Role of Prosody in the Perception of Synthesized and Natural Speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_55
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)