Skip to main content

A Typology of Spontaneous Speech

  • Chapter
Computing Prosody

Abstract

Building accurate computational models of the prosody of spontaneous speech is a daunting enterprise because speech produced without a carefully devised written script does not readily allow the explicit control and repeated observation that read “lab speech” corpora are designed to provide. The prosody of spontaneous speech is affected profoundly by the social and rhetorical context of the recording, and these contextual factors can themselves vary widely in ways beyond our current understanding and control, so that there are many types of spontaneous speech which differ substantially not just from lab speech but also from each other. This paper motivates the study of spontaneous speech by describing several important aspects of prosody and its function that cannot be studied fully in lab speech, either because the relevant phenomena do not occur at all in lab speech or occur in a limited range of types. It then lists and characterizes some kinds of spontaneous speech that have been successfully recorded and analysed by scientists working on some of these aspects of prosody or on related discourse phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. H. Anderson, M. Bader, E. G. Bard, E. Boyle, G. Docherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, H. Thompson, and R. Weinert. The HCRC map task corpus. Language and Speech, 34:351–366, 1991.

    Google Scholar 

  2. G. Ayers, G. Bruce, B. Granstrom, K. Gustafson, M. Home, D. House, and P. Touati. Modelling intonation in dialogue. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 2, pp. 278–281, 1995.

    Google Scholar 

  3. C. Avesani, J. Hirschberg, and P. Prieto. The intonational disambiguation of potentially ambiguous utterances in English, Italian, and Spanish. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 1, pp. 174–177, 1995.

    Google Scholar 

  4. A. Arvaniti. Review of Stress and prosodic structure in Greek: A phonological, physiological and perceptual study, by A. Botinis. Journal of Phonetics, 18:65–69, 1990.

    Google Scholar 

  5. J. Azuma and Y. Tsukuma. Role of F0 and pause in disambiguating syntactically ambiguous Japanese sentences. In Proceedings of the XIIème International Congress of Phonetic Sciences, Aix-en-Provence, France, Vol. 3, pp. 274–277, 1991.

    Google Scholar 

  6. C. Avesani. A contribution to the synthesis of Italian intonation. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 2, pp. 833–836, 1990.

    Google Scholar 

  7. G. Ayers. Discourse functions of pitch range in spontaneous and read speech. OSU Working Papers in Linguistics, 44:1–49, 1994.

    Google Scholar 

  8. G. M. Ayers. Nuclear accent types and prominence: some psycholinguist ic experiments. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 3, pp. 660–663, 1995.

    Google Scholar 

  9. G. Brown, K. L. Currie, and J. Kenworthy. Questions of Intonation. Croom Helm, 1980.

    Google Scholar 

  10. M. E. Beckman and J. Edwards. Articulatory evidence for differentiating stress categories. In P. A. Keating, editor, Phonological Structure and Phonetic Form: Papers in Laboratory Phonology III, pp. 7–33. Cambridge, UK: Cambridge University Press, 1994.

    Google Scholar 

  11. G. Bruce, B. Granström, K. Gustafson, and D. House. Aspects of prosodie phrasing in Swedish. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 1, pp. 109–112, 1992.

    Google Scholar 

  12. G. Bruce. Swedish Word Accents in Sentence Perspective. Lund: Gleerup, 1977.

    Google Scholar 

  13. G. Bruce. Developing the Swedish intonational model. Working Papers, Lund University, 22:51–116, 1982.

    Google Scholar 

  14. M. E. Beckman, M. G. Swora, J. Rauschenberg, and K. de Jong. Stress shift, stress clash, and polysyllabic shortening in a prosodically annotated discourse. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 1, pp. 5–8, 1990.

    Google Scholar 

  15. W. E. Cooper, S. J. Eady, and P. R. Mueller. Acoustical aspects of contrastive stress in question-answer contexts. J. Acoust. Soc. Am., 77:2142–2156, 1986.

    Article  ADS  Google Scholar 

  16. W. L. Chafe. The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production. Norwood, NJ: Ablex, 1980.

    Google Scholar 

  17. H. J. Cedergren and L. Simoneau. La chute des voyelles hautes en français de Montréal: Às-tu entendu la belle syncope?. In M. Lemieux and H. J. Cedergren, editors, Les Tendences Dynamiques du Français Parlé á Montreál, pp. 57–144. Montreal: Office de la Langue Française, 1985.

    Google Scholar 

  18. R. Collier and J. ‘t Hart. Cursus Nederlandse Intonatie. Leuven: Acco, 1981.

    Google Scholar 

  19. K. de Jong. The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. J. Acoust Soc. Am., 97:491–504, 1995.

    Article  ADS  Google Scholar 

  20. J. Esser and A. Polomski. Comparing Reading and Speaking Intonation. Amsterdam: Rodopi, 1988.

    Google Scholar 

  21. C. A. Fowler and J. Housum. Talkers’ signalling of ‘new’ and ‘old’ words in speech, and listeners’ perception and use of the distinction.Journal of Memory & Language, 26:489–504, 1987.

    Article  Google Scholar 

  22. J. Fletcher. Rhythm and lengthening in French.Journal of Phonetics, 19:193–212, 1991.

    Google Scholar 

  23. L. Fais, K. Loken-Kim, and Y-D. Park. Speakers’ responses to requests for repetition in a multimedia language processing environment. Proceedings of the International Conference on Cooperative Multimodal Communication, pp. 129–144, 1995.

    Google Scholar 

  24. B. A. Fox. Discourse Structure and Anaphora: Written and Conversational English. Cambridge, UK: Cambridge University Press, 1987.

    Book  Google Scholar 

  25. B. Grosz and J. Hirschberg. Some intonational characteristics of discourse structure. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, Vol. 1, pp. 429–432, 1992.

    Google Scholar 

  26. C. Gussenhoven and A. C. M. Rietveld. Fundamental frequency declination in Dutch: Testing three hypotheses.Journal of Phonetics, 16:355–369, 1988.

    Google Scholar 

  27. B. Grosz and C. Sidner. Attention, intentions, and the structure of discourse. Computational Linguistics, 12:175–204, 1986.

    Google Scholar 

  28. R. Geluykens and M. Swerts. Local and global prosodic cues to discourse organization in dialogues. Working Papers 41, Proc. ESCA Workshop on Prosody, Lund University, Sweden, pp. 108–111, 1993.

    Google Scholar 

  29. M. Grice and M. Savino. Low tone versus ‘sag’ in Bari Italian intonation: A perceptual experiment. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 4, pp. 658–661, 1995.

    Google Scholar 

  30. D. Hindle. Deterministic parsing of syntactic nonfluencies. Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, pp. 123–128, 1983.

    Google Scholar 

  31. J. Hirschberg and D. Litman. Now let’s talk about now: Identifying cue phrases intonationally. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, pp. 163–171, 1987.

    Google Scholar 

  32. J. Hirschberg and C. Nakatani. A speech-first model for repair identification in spoken language systems. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1173–1176, 1993.

    Google Scholar 

  33. Y. Homma. The rhythm of Tanka, short Japanese poems; read in prose style and in contest style. In Proceedings of the XIIème International Congress of Phonetic Sciences, Aix-en-Provence, France, Vol. 2, pp. 314–317, 1991.

    Google Scholar 

  34. J. Hirschberg and J. Pierrehumbert. The intonational structuring of discourse. Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, pp. 136–144, 1986.

    Google Scholar 

  35. S. A. Jun and M. Oh. A prosodic analysis of three sentence types with ‘wh’ words in Korean. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 323–326, 1994.

    Google Scholar 

  36. A. Kießling, R. Kompe, H. Niemann, E. Nöth, and A. Batliner. Roger, sorry, I’m still listening: Dialog guiding signals in information retrieval dialogs. Working Papers 41, Proceedings of the ESCA Workshop on Prosody, Lund University, Sweden, pp. 140–143, 1993.

    Google Scholar 

  37. K. J. Kohler. Categorical pitch perception. In Proceedings of the 11th International Congress of Phonetic Sciences, Tallin, Estonia, Vol. 5, pp. 331–333, 1987.

    Google Scholar 

  38. S. Kori. Nihongo bun’ontyoo no kenkyuu kadai. Paper presented at the International Symposium on Prosody, 1992.

    Google Scholar 

  39. D. R. Ladd. The Structure of Intonational Meaning. Bloomington: Indiana University Press, 1980.

    Google Scholar 

  40. I. Lehiste. Phonetic disambiguation of syntactic ambiguity. Glossa, 7:107–122, 1973.

    Google Scholar 

  41. I. Lehiste. The phonetic structure of paragraphs. In A. Cohen and S. Nooteboom, editors, Structure and Process in Speech Perception, pp. 195–203. Heidelberg: Springer, 1975.

    Chapter  Google Scholar 

  42. D. R. Ladd, K. E. A. Silverman, F. Tolkmitt, G. Bergmann, and K. R. Scherer. Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect. J. Acoust. Soc. Am., 78:435–444, 1985.

    Article  ADS  Google Scholar 

  43. D. R. Ladd, J. Verhoeven, and K. Jacobs. Influence of adjacent pitch accents on each other’s perceived prominence, two contradictory effects.Journal of Phonetics, 22:87–99, 1994.

    Google Scholar 

  44. L. Hirschman. MADCOW. Multi-site data collection for a spoken language corpus. Proceedings DARPA Speech and Natural Language Workshop, pp. 7–14, 1992.

    Google Scholar 

  45. I. R. Murray and J. L. Arnott. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. J. Acoust. Soc. Am., 93:1097–1108,1993.

    Article  ADS  Google Scholar 

  46. K. Maekawa. Perception of intonational characteristics of wh and non-wh questions in Tokyo Japanese. In Proceedings of the XIIème International Congress of Phonetic Sciences, Aix-en- Provence, France, Vol. 4, pp. 202–205, 1991.

    Google Scholar 

  47. S. Nakajima and J. Allen. A study on prosody and discourse structure in cooperative dialogues. Phonetica, 50:197–210,1993.

    Article  Google Scholar 

  48. M. H. O’Malley, D. R. Kloker, and D. Dara-Abrams. Recovering parentheses from spoken algebraic expressions. IEEE Trans. Audio and Electroacoustics, AU-21:217–220, 1973.

    Article  Google Scholar 

  49. J. B. Pierrehumbert and M. E. Beckman. Japanese Tone Structure. Cambridge, MA: MIT Press, 1988.

    Google Scholar 

  50. J. Pitrelli, M. E. Beckman, and J. Hirschberg. Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 123–126, 1994.

    Google Scholar 

  51. J. Pierrehumbert and J. Hirschberg. The meaning of intonation contours in the interpretation of discourse. In P. R. Cohen, J. Morgan, and M. E. Pollack, editors, Intentions in Communication, pp. 271–311. Cambridge, MA: MIT Press, 1990.

    Google Scholar 

  52. R. Passonneau and D. Litman. Feasibility of automated discourse segmentation. Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 148–163, 1993.

    Google Scholar 

  53. M. Rooth. A theory of focus interpretation. Natural Language Semantics, 1:75–116, 1992.

    Article  Google Scholar 

  54. K. E. A. Silverman, E. Blaauw, J. Spitz, and J. Pitrelli. Towards using prosody in speech recognition/understanding systems: differences between read and spontaneous speech. Proceedings DARPA Speech and Natural Language Workshop, pp. 435–440, 1992.

    Google Scholar 

  55. D. Schaffer. The role of intonation as a cue in turn taking in conversation.Journal of Phonetics, 11:243–344, 1983.

    Google Scholar 

  56. K. R. Scherer. Vocal affect expression: a review and model for future research. Psychological Bulletin, 99:143–165, 1986.

    Article  Google Scholar 

  57. K. R. Scherer. How emotion is expressed in speech and singing. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 3, pp. 90–96, 1995.

    Google Scholar 

  58. C-L. Shih. Tone and intonation in Mandarin. Working Papers, Cornell Phonetics Laboratory, 3:83–109, 1988.

    Google Scholar 

  59. K. E. A. Silverman, A. Kalyanswamy, J. Silverman, S. Basson, and D. Yashchin. Synthesizer intelligibility in the context of a name-and-address information service. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 2169–2172, 1993.

    Google Scholar 

  60. E. E. Shriberg and R. J. Lickley. Intonation of clause-internal filled pauses. Phonetica, 50:172–179, 1993.

    Article  Google Scholar 

  61. E. Strangert. Perceived pauses, silent intervals, and syntactic boundaries. PHONUM, 1:35–38, 1993.

    Google Scholar 

  62. M. Swerts. On the prosodie prediction of discourse finality. Working Papers 41? Proceedings of the ESC A Workshop on Prosody, Lund University, Sweden, pp. 96–99, 1993.

    Google Scholar 

  63. M. Swerts. Combining statistical and phonetic analyses of spontaneous discourse segmentation. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 4, pp. 208–211, 1995.

    Google Scholar 

  64. Y. Tsukuma and J. Azuma. Prosodie features determining the comprehension of syntactically ambiguous sentences in Mandarin Chinese. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 1, pp. 505–508, 1990.

    Google Scholar 

  65. J. M. B. Terken. The distribution of pitch accents in instructions as a function of discourse structure. Language & Speech, 27:269–289, 1984.

    Google Scholar 

  66. P. Touati. Structure Prosodiques du Suëdois et du Français. Lund: Lund University Press, 1987.

    Google Scholar 

  67. P. Touati. Prosodie aspects of political rhetoric. Working Papers 41, Proceedings of the ESCA Workshop on Prosody, Lund University, Sweden, pp. 168–171, 1993.

    Google Scholar 

  68. P. Touati. Pitch range and register in French political speech. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 4, pp. 244–247, 1995.

    Google Scholar 

  69. J. Tsumaki. Intonational properties of adverbs in Tokyo Japanese. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 4, pp. 1727–1730, 1994.

    Google Scholar 

  70. T. Uyeno, H. Hayashibe, K. Imai, H. Imagawa, and S. Kiritani. Syntactic structures and prosody in Japanese. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, University of Tokyo, 15:91–108, 1981.

    Google Scholar 

  71. J. J. Venditti. The influence of syntax on prosodic structure in Japanese. In Working Papers in Linguistics, Vol. 44, pp. 191–223. The Ohio State University, 1994.

    Google Scholar 

  72. C. van Wijk and G. Kempen. A dual system for producing self- repairs in spontaneous speech: Evidence from experimentally elicited corrections. Cognitive Psychology, 19:403–440, 1987.

    Article  Google Scholar 

  73. J. J. Venditti and H. Yamashita-Butler. Prosodic information and processing of temporarily ambiguous constructions in Japanese. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 3, pp. 1147–1150, 1994.

    Google Scholar 

  74. G. Ward and J. Hirschberg. Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61: 747–776, 1985.

    Article  Google Scholar 

  75. A. Woodbury. Against intonational phrases in Central Alaskan Yupik Eskimo. Paper presented at the annual meeting of the Linguistic Society of America, Los Angeles, CA, 1993.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Beckman, M.E. (1997). A Typology of Spontaneous Speech. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2258-3_2

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-7476-6

  • Online ISBN: 978-1-4612-2258-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics