Skip to main content
Log in

Prosody Evaluation as a Diagnostic Process: Subjective vs. Objective Measurements

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

A set of perception experiments, using reiterant and lexicalised speech, was designed to perform a diagnostic of the relative implication of prosody in the segmentation and hierarchisation of speech. Both natural and synthetic intonation were evaluated. Then, two distance measures—correlation and root-mean-square distance on the acoustic parameters (F0, syllabic duration and intensity)—were applied to match the perception results. This objective vs. subjective comparison underlines which acoustic cues are used by listeners to judge the adequacy of prosody in performing a given function such as demarcation. Results can be summarized by a scale of the perceptual distance between two demarcation functions. This study also points out the ability of listeners to retrieve pertinent information on the basis of pure prosodic stimuli.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aubergé,V. (1991). La Synthèse de la Parole: Des Règles au Lexique. Ph.D. Thesis, Université P. Mendes-France, Grenoble, France.

    Google Scholar 

  • Aubergé, V. (2002). A gestalt morpholoogy of prosody directed by functions: The example of a step by step model developed at ICP. Proceedings of the First International Conference on Speech Prosody, Aix-en-Provence, France, pp. 151-154.

  • Aubergé,V. and Bailly,G. (1995). Generation of intonation:Aglobal approach. Proceedings of EuroSpeech'95, Madrid, Spain, vol. 3, pp. 2065-2068.

    Google Scholar 

  • Baddeley, A.D. (1986). Working Memory. Oxford University Press.

  • Barbosa, P. and Bailly, G. (1994). Characterisation of rhythmic patterns for text-to-speech synthesis. Speech Communication, 15:127-137.

    Google Scholar 

  • Campbell, N. (1993a). Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343-354.

    Google Scholar 

  • Campbell, N. (1993b). Durational cues to prominence and grouping. ESCA Workshop on Prosody, Lund University Working Papers, Lund, Sweden, vol. 41, pp. 38-41.

    Google Scholar 

  • Campbell, N. (1998). Where is the information in speech? Proceedings of the Third ESCA/COCOSDA International Workshop on Speech Synthesis, Jenolan Caves, Australia, pp. 17-20.

  • Charpentier, F. and Moulines, E. (1990). Pitch-synchronous waveform processing techniques for text-to-speech using diphones. Speech Communication, 9(5/6):453-467.

    Google Scholar 

  • Fourcin, A. (1992). Assessment of synthetic speech. In G. Bailly, C. Benoît, and T.R. Sawallis (Eds.), Talking Machines-Theories, Models and Designs, Amsterdam: Elsevier, pp. 431-434.

    Google Scholar 

  • Gérard, C. and Dolgër, N. (1996). Taille des fenêtres perceptives, empan de la mémoire auditive. XXIème Journées d' Étude de la Parole, Avignon, France, pp. 59-62.

  • Hirst, D. and Di Cristo, A. (Eds.) (1998). Intonation Systems: A Survey of Twenty Languages. Cambridge University Press.

  • Hirst, D. and Di Cristo, A. (1998). A survey of intonation systems. In D. Hirst and A. Di Cristo (Eds.), Intonation Systems: A Survey of Twenty Languages, Cambridge University Press, pp. 1-44.

  • Larkey, L.S. (1983). Reiterant speech: An acoustic and perceptual validation. Journal of the Acoustical Society of America, 73(4):1337-1345.

    Google Scholar 

  • Liberman, M.Y. and Streeter, L.A. (1978). Use of nonsense-syllable mimicry in the study of prosodic phenomena. Journal of the Acoustical Society of America, 63(1):231-233.

    Google Scholar 

  • Martin, P. (1980). De la non congruence entre les structures syntaxiques et prosodiques. Travaux de l'Institut de Phonétique d'Aix, vol. 7, pp. 319-339.

    Google Scholar 

  • Marcus, S.M. (1976). Perceptual Centres. PhD Thesis, University of Cambridge, UK.

    Google Scholar 

  • Morlec, Y. (1997). Génération Multiparamétrique de la Prosodie du Fran¸cais par Apprentissage Automatique. PhD Thesis, Institut National Polytechnique de Grenoble, France.

    Google Scholar 

  • Morlec, Y., Rilliard, A., Bailly, G., and Aubergé, V. (1998). Evaluating the adequacy of synthetic prosody in signalling syntactic boundaries: Methodology and first results. Proceedings of the First International Conference on Language Resources and Evaluation. Granada, Spain, pp. 647-650.

  • Oller, D.K. (1973). The effect of position in utterance on speech segment duration in English. Journal of the Acoustical Society of America, 54(5):1235-1247.

    Google Scholar 

  • Pagel, V. (1999). De l'Utilisation d'Informations Acoustiques Suprasegmentales en Reconnaissance de la Parole Continue. PhD Thesis, Université Henri Poincaré, Nancy, France.

    Google Scholar 

  • Rilliard, A. (2000). Vers une Mesure de l'Intelligibilité Linguistique de la Prosodie-Évaluation Diagnostique des Prosodies Synthétique et Naturelle. PhD Thesis, Institut National Polytechnique de Grenoble, France.

    Google Scholar 

  • Rilliard, A., Aubergé, V., Bailly, G., and Morlec, Y. (1997). Vers une mesure de l'information linguistique véhiculée par la prosodie. Proceeding of FRANCIL'97, Avignon, France, pp. 481-487.

  • Rilliard, A. and Aubergé, V. (1998). Reiterant speech for the evaluation of natural vs. synthetic prosody. Proceedings of the International Congress on Spoken Language Processing, Sydney, Australia, pp. 675-678.

  • Rolland, G. (2000). La pertinence psycho-acoustique du syntagme accentuel en fran¸cais. Mémoire de DEA Signal, Image, Parole, Télécoms, Institut National Polytechnique de Grenoble, France.

    Google Scholar 

  • Scott, S.K. (1993). P-Centres in Speech-An Acoustic Analysis. PhD Thesis, University College, London, UK.

    Google Scholar 

  • Sonntag, G.P. and Portele, T. (1998). PURR-A method for prosody evaluation and investigation. Computer Speech and Language, 12:437-451.

    Google Scholar 

  • Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18:643-662.

    Google Scholar 

  • Vaissière, J. (1997). Langues, prosodies et syntaxe. Traitement Automatique des Langues, 38(1):53-82.

    Google Scholar 

  • Van Santen, J.P.H. (1997). Prosodic modeling in text-to-speech synthesis. Proceedings of EuroSpeech'97, Rhodes, Greece, keynote speech, pp. KN-19:28.

  • Yvon, F., Boula de Mareüil, P., d'Alessandro, C., Aubergé, V., Bagein, M., Bailly, G., Béchet, F., Foukia, S., Goldman, J.F., Keller, E., Oshaughnessy, D., Pagel, V., Sannier, F., Véronis, J., and Zellner, B. (1998). Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French. Computer Speech and Language, 12(4):393-410.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rilliard, A., Aubergé, V. Prosody Evaluation as a Diagnostic Process: Subjective vs. Objective Measurements. International Journal of Speech Technology 6, 409–418 (2003). https://doi.org/10.1023/A:1025717202812

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025717202812

Navigation