Skip to main content

The Role of Prosody in the Perception of Synthesized and Natural Speech

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

  • 1612 Accesses

Abstract

This paper presents the results of research of perception of synthesized and natural speech, and investigates the role of the prosodic characteristic of pauses in the process of speech comprehension. The research involved a series of perception tasks, including quality assessment, an intelligibility task and comprehension tests of ten shorter and one longer text in Serbian produced by the AlfaNum speech synthesizer and a professional actor, and a follow-up comprehension task of synthesized speech with modified pauses. The results of the intelligibility task show similar performance by both groups of subjects, while the comprehensibility tasks indicate better performance for natural than for synthesized speech. The results of the follow-up task show that the modified prosody contributed to the better performance of the subjects. The quality assessment task revealed the subjects preference for natural speech mainly on the basis of the prosodic characteristic of pauses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Pauses in the form of silence are not the only significant indicators of IP boundaries; the other common cues of IP boundaries are the lengthening of final segments (pre-boundary lengthening) and the presence of a specific boundary tone.

  2. 2.

    The results will be reported in detail in Sect. 3.

  3. 3.

    SUS is the methodology proposed as the most appropriate for assessing segmental intelligibility in [6] and references therein.

References

  1. Pisoni, D.B.: Perception of synthetic speech. In: van Santen, J.P.H., Sproat, R.W., Olive, J.P., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 541–560. Springer, New York (1997)

    Chapter  Google Scholar 

  2. Pisoni, D.B.: Some measures of intelligibility and comprehension. In: Allen, J., Hunnicutt, M.S., Klatt, D.H. (eds.) From Text to Speech: The MITalk System, pp. 151–171. Cambridge University Press, Cambridge, UK (1987)

    Google Scholar 

  3. Pisoni, D.B.: Speeded classification of natural and synthetic speech in a lexical decision task. J. Acoust. Soc. Am. 70, S98 (1981)

    Article  Google Scholar 

  4. Pisoni, D.B., Nusbaum, H., Greene, B.G.: Perception of synthetic speech generated by rule. In: Proceedings of the IEEE, pp. 1665–1676 (1985)

    Google Scholar 

  5. Pols, L.C.W., Santen, J.P.H. van, Abe, M., Kahn, D., Keller, E.: The use of large text corpora for evaluation text-to-speech systems. In: Proceedings of the First International Conference on Language Resources and Evaluation, pp. 637–640. Granada, Spain (1998)

    Google Scholar 

  6. Chang, Y.Y.: Evaluation of TTS systems in intelligibility and comprehension tasks. In: ROCLING, Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing, Taipei, Taiwan, pp. 64–78 (2011)

    Google Scholar 

  7. Warren, R.M.: Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970)

    Article  Google Scholar 

  8. Warren, R.M., Obusek, C.: Speech perception and phonemic restorations. Percept. Psychophys. 9, 358–363 (1971)

    Article  Google Scholar 

  9. Selkirk, E.: Phonology and Syntax: The Relation Between Sound and Structure. MIT Press, Cambridge (1984)

    Google Scholar 

  10. Kjelgaard, M.M., Speer, S.R.: Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. J. Mem. Lang. 40, 153–194 (1999)

    Article  Google Scholar 

  11. Swerts, M., Geluykens, R.: Prosody as a marker of information flow in spoken discourse. Lang. Speech 37, 21–45 (1994)

    Google Scholar 

  12. Hirschberg, J.: Communication and prosody: functional aspects of prosody. Speech Commun. (Special Issue on Dialogue and Prosody) 36, 31–43 (2001)

    MATH  Google Scholar 

  13. Shriberg, E., Stolcke, A., Hakkani-Tür, D., Tür, G.: Prosody-Based Automatic Segmentation of Speech into Sentences and Topics. Speech Commun. 32, 127–154 (2000)

    Article  Google Scholar 

  14. Cutler, A., Dahan, D., van Donselaar, W.: Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40, 141–201 (1997)

    Google Scholar 

  15. Swerts, M., Geluykens, R.: Local and global prosodic cues to discourse organization in dialogues. In: Proceedings of the ESCA Workshop on Prosody, pp. 108–111. Lund, Sweden (1993)

    Google Scholar 

  16. Tench, P.: The Intonation System of English. Cassell, London (1996)

    Google Scholar 

  17. Jonides, J., Lewis, R.L., Nee, D.E., Lustig, C.A., Berman, M.G., Moore, K.S.: The mind and brain of short-term memory. Annu. Rev. Psychol. 59, 193–224 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

The presented study was financed by the Ministry of Education and Science of the Republic of Serbia under the Research grant TR32035.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bojana Jakovljević .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Marković, M., Jakovljević, B., Milićev, T., Miliević, N. (2015). The Role of Prosody in the Perception of Synthesized and Natural Speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23132-7_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23131-0

  • Online ISBN: 978-3-319-23132-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics