Abstract
Building accurate computational models of the prosody of spontaneous speech is a daunting enterprise because speech produced without a carefully devised written script does not readily allow the explicit control and repeated observation that read “lab speech” corpora are designed to provide. The prosody of spontaneous speech is affected profoundly by the social and rhetorical context of the recording, and these contextual factors can themselves vary widely in ways beyond our current understanding and control, so that there are many types of spontaneous speech which differ substantially not just from lab speech but also from each other. This paper motivates the study of spontaneous speech by describing several important aspects of prosody and its function that cannot be studied fully in lab speech, either because the relevant phenomena do not occur at all in lab speech or occur in a limited range of types. It then lists and characterizes some kinds of spontaneous speech that have been successfully recorded and analysed by scientists working on some of these aspects of prosody or on related discourse phenomena.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. H. Anderson, M. Bader, E. G. Bard, E. Boyle, G. Docherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, H. Thompson, and R. Weinert. The HCRC map task corpus. Language and Speech, 34:351–366, 1991.
G. Ayers, G. Bruce, B. Granstrom, K. Gustafson, M. Home, D. House, and P. Touati. Modelling intonation in dialogue. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 2, pp. 278–281, 1995.
C. Avesani, J. Hirschberg, and P. Prieto. The intonational disambiguation of potentially ambiguous utterances in English, Italian, and Spanish. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 1, pp. 174–177, 1995.
A. Arvaniti. Review of Stress and prosodic structure in Greek: A phonological, physiological and perceptual study, by A. Botinis. Journal of Phonetics, 18:65–69, 1990.
J. Azuma and Y. Tsukuma. Role of F0 and pause in disambiguating syntactically ambiguous Japanese sentences. In Proceedings of the XIIème International Congress of Phonetic Sciences, Aix-en-Provence, France, Vol. 3, pp. 274–277, 1991.
C. Avesani. A contribution to the synthesis of Italian intonation. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 2, pp. 833–836, 1990.
G. Ayers. Discourse functions of pitch range in spontaneous and read speech. OSU Working Papers in Linguistics, 44:1–49, 1994.
G. M. Ayers. Nuclear accent types and prominence: some psycholinguist ic experiments. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 3, pp. 660–663, 1995.
G. Brown, K. L. Currie, and J. Kenworthy. Questions of Intonation. Croom Helm, 1980.
M. E. Beckman and J. Edwards. Articulatory evidence for differentiating stress categories. In P. A. Keating, editor, Phonological Structure and Phonetic Form: Papers in Laboratory Phonology III, pp. 7–33. Cambridge, UK: Cambridge University Press, 1994.
G. Bruce, B. Granström, K. Gustafson, and D. House. Aspects of prosodie phrasing in Swedish. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 1, pp. 109–112, 1992.
G. Bruce. Swedish Word Accents in Sentence Perspective. Lund: Gleerup, 1977.
G. Bruce. Developing the Swedish intonational model. Working Papers, Lund University, 22:51–116, 1982.
M. E. Beckman, M. G. Swora, J. Rauschenberg, and K. de Jong. Stress shift, stress clash, and polysyllabic shortening in a prosodically annotated discourse. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 1, pp. 5–8, 1990.
W. E. Cooper, S. J. Eady, and P. R. Mueller. Acoustical aspects of contrastive stress in question-answer contexts. J. Acoust. Soc. Am., 77:2142–2156, 1986.
W. L. Chafe. The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production. Norwood, NJ: Ablex, 1980.
H. J. Cedergren and L. Simoneau. La chute des voyelles hautes en français de Montréal: Às-tu entendu la belle syncope?. In M. Lemieux and H. J. Cedergren, editors, Les Tendences Dynamiques du Français Parlé á Montreál, pp. 57–144. Montreal: Office de la Langue Française, 1985.
R. Collier and J. ‘t Hart. Cursus Nederlandse Intonatie. Leuven: Acco, 1981.
K. de Jong. The supraglottal articulation of prominence in English: Linguistic stress as localized hyperarticulation. J. Acoust Soc. Am., 97:491–504, 1995.
J. Esser and A. Polomski. Comparing Reading and Speaking Intonation. Amsterdam: Rodopi, 1988.
C. A. Fowler and J. Housum. Talkers’ signalling of ‘new’ and ‘old’ words in speech, and listeners’ perception and use of the distinction.Journal of Memory & Language, 26:489–504, 1987.
J. Fletcher. Rhythm and lengthening in French.Journal of Phonetics, 19:193–212, 1991.
L. Fais, K. Loken-Kim, and Y-D. Park. Speakers’ responses to requests for repetition in a multimedia language processing environment. Proceedings of the International Conference on Cooperative Multimodal Communication, pp. 129–144, 1995.
B. A. Fox. Discourse Structure and Anaphora: Written and Conversational English. Cambridge, UK: Cambridge University Press, 1987.
B. Grosz and J. Hirschberg. Some intonational characteristics of discourse structure. In Proceedings of the International Conference on Spoken Language Processing, Banff, Canada, Vol. 1, pp. 429–432, 1992.
C. Gussenhoven and A. C. M. Rietveld. Fundamental frequency declination in Dutch: Testing three hypotheses.Journal of Phonetics, 16:355–369, 1988.
B. Grosz and C. Sidner. Attention, intentions, and the structure of discourse. Computational Linguistics, 12:175–204, 1986.
R. Geluykens and M. Swerts. Local and global prosodic cues to discourse organization in dialogues. Working Papers 41, Proc. ESCA Workshop on Prosody, Lund University, Sweden, pp. 108–111, 1993.
M. Grice and M. Savino. Low tone versus ‘sag’ in Bari Italian intonation: A perceptual experiment. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 4, pp. 658–661, 1995.
D. Hindle. Deterministic parsing of syntactic nonfluencies. Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, pp. 123–128, 1983.
J. Hirschberg and D. Litman. Now let’s talk about now: Identifying cue phrases intonationally. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, pp. 163–171, 1987.
J. Hirschberg and C. Nakatani. A speech-first model for repair identification in spoken language systems. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1173–1176, 1993.
Y. Homma. The rhythm of Tanka, short Japanese poems; read in prose style and in contest style. In Proceedings of the XIIème International Congress of Phonetic Sciences, Aix-en-Provence, France, Vol. 2, pp. 314–317, 1991.
J. Hirschberg and J. Pierrehumbert. The intonational structuring of discourse. Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, pp. 136–144, 1986.
S. A. Jun and M. Oh. A prosodic analysis of three sentence types with ‘wh’ words in Korean. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 323–326, 1994.
A. Kießling, R. Kompe, H. Niemann, E. Nöth, and A. Batliner. Roger, sorry, I’m still listening: Dialog guiding signals in information retrieval dialogs. Working Papers 41, Proceedings of the ESCA Workshop on Prosody, Lund University, Sweden, pp. 140–143, 1993.
K. J. Kohler. Categorical pitch perception. In Proceedings of the 11th International Congress of Phonetic Sciences, Tallin, Estonia, Vol. 5, pp. 331–333, 1987.
S. Kori. Nihongo bun’ontyoo no kenkyuu kadai. Paper presented at the International Symposium on Prosody, 1992.
D. R. Ladd. The Structure of Intonational Meaning. Bloomington: Indiana University Press, 1980.
I. Lehiste. Phonetic disambiguation of syntactic ambiguity. Glossa, 7:107–122, 1973.
I. Lehiste. The phonetic structure of paragraphs. In A. Cohen and S. Nooteboom, editors, Structure and Process in Speech Perception, pp. 195–203. Heidelberg: Springer, 1975.
D. R. Ladd, K. E. A. Silverman, F. Tolkmitt, G. Bergmann, and K. R. Scherer. Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect. J. Acoust. Soc. Am., 78:435–444, 1985.
D. R. Ladd, J. Verhoeven, and K. Jacobs. Influence of adjacent pitch accents on each other’s perceived prominence, two contradictory effects.Journal of Phonetics, 22:87–99, 1994.
L. Hirschman. MADCOW. Multi-site data collection for a spoken language corpus. Proceedings DARPA Speech and Natural Language Workshop, pp. 7–14, 1992.
I. R. Murray and J. L. Arnott. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. J. Acoust. Soc. Am., 93:1097–1108,1993.
K. Maekawa. Perception of intonational characteristics of wh and non-wh questions in Tokyo Japanese. In Proceedings of the XIIème International Congress of Phonetic Sciences, Aix-en- Provence, France, Vol. 4, pp. 202–205, 1991.
S. Nakajima and J. Allen. A study on prosody and discourse structure in cooperative dialogues. Phonetica, 50:197–210,1993.
M. H. O’Malley, D. R. Kloker, and D. Dara-Abrams. Recovering parentheses from spoken algebraic expressions. IEEE Trans. Audio and Electroacoustics, AU-21:217–220, 1973.
J. B. Pierrehumbert and M. E. Beckman. Japanese Tone Structure. Cambridge, MA: MIT Press, 1988.
J. Pitrelli, M. E. Beckman, and J. Hirschberg. Evaluation of prosodic transcription labelling reliability in the ToBI framework. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 1, pp. 123–126, 1994.
J. Pierrehumbert and J. Hirschberg. The meaning of intonation contours in the interpretation of discourse. In P. R. Cohen, J. Morgan, and M. E. Pollack, editors, Intentions in Communication, pp. 271–311. Cambridge, MA: MIT Press, 1990.
R. Passonneau and D. Litman. Feasibility of automated discourse segmentation. Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 148–163, 1993.
M. Rooth. A theory of focus interpretation. Natural Language Semantics, 1:75–116, 1992.
K. E. A. Silverman, E. Blaauw, J. Spitz, and J. Pitrelli. Towards using prosody in speech recognition/understanding systems: differences between read and spontaneous speech. Proceedings DARPA Speech and Natural Language Workshop, pp. 435–440, 1992.
D. Schaffer. The role of intonation as a cue in turn taking in conversation.Journal of Phonetics, 11:243–344, 1983.
K. R. Scherer. Vocal affect expression: a review and model for future research. Psychological Bulletin, 99:143–165, 1986.
K. R. Scherer. How emotion is expressed in speech and singing. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 3, pp. 90–96, 1995.
C-L. Shih. Tone and intonation in Mandarin. Working Papers, Cornell Phonetics Laboratory, 3:83–109, 1988.
K. E. A. Silverman, A. Kalyanswamy, J. Silverman, S. Basson, and D. Yashchin. Synthesizer intelligibility in the context of a name-and-address information service. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 2169–2172, 1993.
E. E. Shriberg and R. J. Lickley. Intonation of clause-internal filled pauses. Phonetica, 50:172–179, 1993.
E. Strangert. Perceived pauses, silent intervals, and syntactic boundaries. PHONUM, 1:35–38, 1993.
M. Swerts. On the prosodie prediction of discourse finality. Working Papers 41? Proceedings of the ESC A Workshop on Prosody, Lund University, Sweden, pp. 96–99, 1993.
M. Swerts. Combining statistical and phonetic analyses of spontaneous discourse segmentation. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 4, pp. 208–211, 1995.
Y. Tsukuma and J. Azuma. Prosodie features determining the comprehension of syntactically ambiguous sentences in Mandarin Chinese. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, Vol. 1, pp. 505–508, 1990.
J. M. B. Terken. The distribution of pitch accents in instructions as a function of discourse structure. Language & Speech, 27:269–289, 1984.
P. Touati. Structure Prosodiques du Suëdois et du Français. Lund: Lund University Press, 1987.
P. Touati. Prosodie aspects of political rhetoric. Working Papers 41, Proceedings of the ESCA Workshop on Prosody, Lund University, Sweden, pp. 168–171, 1993.
P. Touati. Pitch range and register in French political speech. In Proceedings of the 13th International Congress of Phonetic Sciences, Stockholm, Sweden, Vol. 4, pp. 244–247, 1995.
J. Tsumaki. Intonational properties of adverbs in Tokyo Japanese. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 4, pp. 1727–1730, 1994.
T. Uyeno, H. Hayashibe, K. Imai, H. Imagawa, and S. Kiritani. Syntactic structures and prosody in Japanese. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, University of Tokyo, 15:91–108, 1981.
J. J. Venditti. The influence of syntax on prosodic structure in Japanese. In Working Papers in Linguistics, Vol. 44, pp. 191–223. The Ohio State University, 1994.
C. van Wijk and G. Kempen. A dual system for producing self- repairs in spontaneous speech: Evidence from experimentally elicited corrections. Cognitive Psychology, 19:403–440, 1987.
J. J. Venditti and H. Yamashita-Butler. Prosodic information and processing of temporarily ambiguous constructions in Japanese. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, Vol. 3, pp. 1147–1150, 1994.
G. Ward and J. Hirschberg. Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61: 747–776, 1985.
A. Woodbury. Against intonational phrases in Central Alaskan Yupik Eskimo. Paper presented at the annual meeting of the Linguistic Society of America, Los Angeles, CA, 1993.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Beckman, M.E. (1997). A Typology of Spontaneous Speech. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_2
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2258-3_2
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive