Elsevier

Journal of Phonetics

Volume 34, Issue 3, July 2006, Pages 319-342
Journal of Phonetics

Tonal features, intensity, and word order in the perception of prominence

https://doi.org/10.1016/j.wocn.2005.06.004Get rights and content

Abstract

The perception of prominence as a function of sentence stress in Finnish was investigated in four experiments. Listeners judged the relative prominence of two consecutive nouns in a three-word utterance, where the accentuation of the nouns was systematically varied by tonal means. Experiments 1 and 2 investigated both the tonal features underlying the subjects’ responses as well as the influence of word order on the perceived prominence of the two accented words. The results showed that similar tonal features regardless of other phonetic differences conditioned the subjects’ judgments of prominence. They further showed that changing the word order influenced the distribution of responses in the two experiments. Two further experiments were administered to check the possible influence of slight tonal and intensity differences in the first two experiments. Only intensity was found to affect the distribution of judgments. Furthermore, the influence was local and only affected the last of the two words. Overall the results suggest that the most important tonal features responsible for the perception of prominence form a so-called flat-hat pattern. That also indicates that different kinds of focus structure influence the perception of prominence even when the judgments are based on decisions about the place of sentence stress.

Introduction

In the field of phonetics it is well-established that linguistic knowledge can sometimes influence phonetic perception in a top-down manner. This is perhaps best seen in phonemic perception where listeners recover canonical phonemes even when they are overlapped and possibly blended in an assimilation. Top-down processing has also been shown to work in phonemic restoration—a particularly powerful auditory illusion in which listeners “hear” parts of words that are not really there (Samuel, 1981). The underlying linguistic factors range from phonological to pragmatic. Moreover, studies in second language learning have shown that native language sounds are perceived more easily than those acquired later in life through a second language (see Hume & Johnson, 2003, and references therein). That is, listeners interpret similar phonetic structures and units differently depending on their linguistic knowledge.

It can, therefore, be hypothesized that such a perceptual influence should also be found within less discrete linguistic and phonetic phenomena, such as prominence. Indeed, Eriksson, Thunberg, and Traunmüller (2001) have found such an effect concerning syllable prominence: in their study, linguistically motivated factors explained the prominence ratings of syllables better than signal-based cues (linguistic factors, 57%; signal-based factors, 48%). The influence of linguistic categories on phonetic perception has consequences for any study of the perception of (prosodic or syllabic) prominence. The influence of top-down processing must be taken into consideration when a seemingly similar prosodic structure can be present with distinctly different syntactic structures. This is the case, for instance, in Finnish where syntactically (relatively) free word order can be used for pragmatic purposes, for example, to bring a given constituent in an utterance into focus without changing the prosodic structure in any way. Therefore, we may expect the prominence pattern in a sentence with unmarked word order, such as “Menemme laivalla Lemille” (We go by boat to Lemi) to be perceived differently with respect to prominence depending on the order of the two adverbs, laivalla and lemille. In other words, changing the word order to “Lemille laivalla” (to Lemi by boat), with an emphatic or contrastive focus on the last word, should be reflected in how prominent the adverbs are perceived to be.

We conducted a series of four experiments to study the perception of prominence in a two-accent utterance in Finnish. We were interested in the characteristic tonal factors of the intonation contour that modulate the perception of prominence and whether they remain the same regardless of different information structures represented by different word order permutations. More importantly, we were interested in whether word order would have an influence on the perception of prominence in such utterances.

It is well-known that speakers can vary the prominence of pitch accents by varying the height of the associated fundamental frequency (f0) maxima to express different degrees of emphasis (Gussenhoven, Repp, Rietveld, Rump, & Terken, 1997, p. 3009). Listeners react to these changes accordingly. That is, the perceived prominence of any accented syllable is related to the height of the fundamental frequency maximum as well as to the relation of that local maximum to other maxima in the utterance. For instance, it has been shown that a later f0 peak in an utterance has to be lower than the previous ones to be perceived as having an equally high pitch (see, for instance, Pierrehumbert (1979) for English, Gussenhoven et al. (1997) for Dutch and Vainio, Mixdorff, & Järvikivi, 2003 for Finnish). Pierrehumbert (1979) explains this by postulating a mental representation of declination which is used by the listener to normalize for physically conditioned declination of f0.

Terken and Hermes (2000) noted that we currently lack sufficient knowledge to determine whether the perception of accent strength varies in a gradient way or not, although results from many experiments seem to support the assumption that the perception of prominence is, in fact, gradual. But if we view prominence as (partially) reflecting a linguistic category—such as focus—rather than as a gradually varying phonetic phenomenon, we may assume that the perception then becomes categorically interpretable. The situation is much the same as with, say, formants, which seem to give rise to categorical perception if we study stop place perception in CV syllables, but are gradual if we study them directly as acoustical entities (Blumstein & Stevens, 1980) or vowel perception in general (Winkler et al., 1999). Thus, if we consider focus to be a discrete linguistic phenomenon, we must assume that the perception of focus must be categorically interpretable in the sense that it must divide the perceptual space at some point. Therefore, we can also assume that, as a linguistic category, focus must influence the perception of prominence in much the same way that phonemes (or different combinations of phonological features) influence the perception of segmental phonetic variables.

The categorical nature of intonation has been studied relatively little, but some evidence for certain intonational phenomena being categorically perceived has been found. However, the evidence seems somewhat conflicting. Remijsen and van Heuven (2003) found evidence for categorical perception between Dutch boundary tones signaling statements and questions. In contrast, Ladd and Morton (1997) did not find such evidence for “normal” and “emphatic” accent peaks in English. Although, they too, found that the utterances were interpreted categorically. In any case, it is not the purpose of the present study to investigate categorical perception per se—which is itself a controversial issue—but rather to establish whether the given categorical interpretations with respect to the gradual prosodic variables are influenced by the linguistic or information structure of the utterance.

Apart from grammatical relations proper, the relative order of constituents within a sentence as well as its phonology can be used to convey aspects of the distribution of information within a sentence. This distribution of information is referred to as the information structure. An important part of information structure has to do with the role of new (given) and old (inferred) information. Although the terminology varies considerably, the given or presupposed information is traditionally referred as the topic of the sentence. In contrast, focus is usually used to refer to what is new, or, what is not within what is pragmatically presupposed (e.g., Van Valin & La Polla, 1997). Many times, however, it is not just whether the information status of a particular referent is “old” or “new” that is important. It is instead, often the relationship between a focused referent (“new” information) and what is pragmatically presupposed which together make the focused referent informative, not the fact that it is newly introduced. In Finnish, for example, the syntactically free word order can be manipulated to serve information structure. Thus, in an unmarked case, such as “menimme laivalla Lemille” (we went by boat to Lemi), the canonical order of the two adverbs (manner + place) in the adverbial phrase conforms to its default information structure, and the phrase as a whole can be said to be under so-called sentence focus (Van Valin & La Polla, 1997) whose prosodic counterpart would be broad focus. Consequently, no pragmatic presuppositions are evoked by the word order. In contrast, however, changing the word order to marked “menimme Lemille laivalla” presupposes the information that we did in fact go to Lemi, but in this case, the word order is used to emphasize or focus the fact that it was by boat we went to Lemi—and not by a car—as if it were an answer to a question “how did you go to Lemi?” (for the pragmatic use of word order in Finnish, see, e.g., Hakulinen & Karlsson, 1979 and Vilkuna, 1989). Apart from word order, there is another means generally available for placing any of the constituents under the domain of focus even in the unmarked case, namely prosody. Focus can be achieved prosodically by increasing the accent or stress on the part of an utterance that is intended to be brought into focus. In Finnish, any constituent can be focused by prosodic means: thus a Finnish speaker can say “Manne meni Lemille” (“Manne went to Lemi”) as well as “Manne meni Lemille” (“Manne went to Lemi”; italics depict prosodic focus). Thus, it is of interest how the two main means available—syntactic and prosodic—for the marking of focus affect the perception of one or another part of an utterance as more or less prominent than the others.

In the present paper the influence of accent strength and word order on prominence perception was studied with a series of perception experiments. The experiments described here fall in line with a series of somewhat similar studies reported by, e.g., Pierrehumbert (1979), Gussenhoven and Rietveld (1988), Terken (1994), and Ladd, Verhoeven, and Jacobs (1994), as well as Gussenhoven et al. (1997), which deal with the perception of prominence in an utterance with two accented words in the form of two f0 peaks on the accented syllables. In this paper, we use the term prominence to refer to the auditory salience of a phonetic or a linguistic unit. We use sentence stress to refer to the utterance level prominence relations between words. In the framework of our study, sentence stress can be seen to signal emphatic or contrastive focus.

The perception of prominence has generally been studied in relation to tonal features and their dynamics (see, for instance, Terken, 1989, Terken, 1994, Gussenhoven et al., 1997, Hermes, 1997, Terken and Hermes, 2000). Most of the studies listed above attempt to relate the f0 variation to perceived prominence in order to develop a metric for prominence (Gussenhoven et al., 1997). All of the earlier studies make clear that listeners estimate the prominence of the pitch peak on the basis of the pitch characteristics of the contour around it (Gussenhoven et al., 1997). However, none of them explicitly examine the possibility that syntax and information structure may influence prosodic perception. In fact, some of the studies use delexicalized utterances and, thus, avoid the problem. Although, this probably does not have consequences with regard to the published results, it may have consequences with regard to their explanations. In other words, they do not take into account the possibility that there may be other than signal-based factors which influence listeners’ prominence estimates. The main difference between the present study and the ones listed above is that the latter all concentrated on prominence as a phonetic phenomenon, whereas in the present study we were interested in how both the tonal means and word order give rise to prominence as it is realized through sentence stress or accent alone (depending on the terminology in use).

The role of other prosodic parameters—mainly intensity and segmental durations—in the perception of prominence has also been investigated, but not as systematically and to a much lesser degree. In particular, the relative intensity within an utterance and its influence on the perception of prominence has not been as systematically studied as the tonal aspects of prosody. This is regrettable, especially since Batliner et al. (2001) have shown that duration and energy features are more important than f0 for both English and German accent classification based on principal components analysis. There are, however a number of studies relating intensity and prosodic focus and prominence. In a production study, Heldner (1996) found an intensity difference between focused and non-focused words, which interacted with the position of the word in the sentence: there was only a slight intensity difference in the medial position, but a stronger effect in the final position. Sluijter and van Heuven (1996) showed that listeners used intensity as a cue to detect word-stress position, but to a lesser degree than, for example, duration. In an important paper, Pierrehumbert (1979) studied intensity (amplitude) in two of the perception experiments. She found that the amplitude effect was 1.5 Hz/dB with regard to the so-called crossover point, where the two f0 peaks were perceived as equally prominent. That is, the increased amplitude during the last peak increased its prominence so that the crossover point was lower by 1.5 Hz for each increased dB in amplitude. She concluded that while intensity plays an important role in the perception of prominence, its effect does not match the effect of f0 in importance. How much of this holds for Finnish, is to be determined.

Section snippets

Experiments

Experiments 1 and 2 were conducted in order to investigate the perception of prominence in Finnish. The first experiment laid the basis for the tonal features whereas the second one was used to investigated the influence of syntactic structure, namely, word order, on the perception of prominence.

In Experiment 1, the sentence “Menemme laivalla Lemille” (We go by boat to Lemi) was used. The sentence permits four possible interpretations with respect to the location of focus:

  • 1.

    broad (or sentence)

General discussion

The present article has reported the results from four experiments investigating two distinct questions about the perception of prominence in Finnish: first, we investigated the tonal aspects and structures responsible for the perception of sentence stress in an utterance with potentially two stressed constituents (here single words). Second, we inquired into the possible influence of word order on the perception of relative prominence of the two constituents. Our aim, then, was to investigate

Conclusion

We conclude that there is a clear and measurable linguistic bias in the perception of prosodic prominence in Finnish. We base this conclusion on the results from Experiments 1 and 2 as well as on the fact that we ruled out all other factors but the different word order as an explanation for the differences in the judgments of prominence of the first NP in the utterances. Furthermore, phonetic differences could not explain all of the differences in the perception of prominence of the second NP

Acknowledgements

We would like to thank Rachael-Anne Knight and three anonymous reviewers as well as Jukka Hyönä, Pirita Pyykkönen and Hanna Westerlund for their insightful comments on the manuscript. We also thank Stefan Werner and Hansjörg Mixdorff for their contributions to the discussions on this subject. The present study was supported by Grant No. 107606 from the Academy of Finland to M. Vainio and Grant No. 106418 from the Academy of Finland to J. Järvikivi.

References (37)

  • J. Terken

    Reaction to C. Gussenhoven and A. Rietveld: Fundamental frequency declination in Dutch: Testing three hypotheses

    Journal of Phonetics

    (1989)
  • Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., & Niemann, H. (2001). Boiling down prosody for the...
  • S.E. Blumstein et al.

    Perceptual invariance and onset spectra for stop consonants in different vowel environments

    The Journal of the Acoustical Society of America

    (1980)
  • Bojar, O., Semecky, J., Vasishth, S., & Kruijff-Korbayova, I. (2004). Processing noncanonical word order in Czech, a...
  • Eriksson, A., Thunberg, G. C., & Traunmüller, H. (2001). Syllable prominence: A matter of vocal effort, phonetic...
  • H. Fujisaki et al.

    Analysis of voice fundamental frequency contours for declarative sentences of Japanese

    Journal of the Acoustical Society of Japan (E)

    (1984)
  • C. Gussenhoven et al.

    The perceptual prominence of fundamental frequency peaks

    Journal of the Acoustical Society of America

    (1997)
  • A. Hakulinen et al.

    Nykysuomen lauseoppia

    (1979)
  • Cited by (36)

    • Phonology, phonetics, and signal-extrinsic factors in the perception of prosodic prominence: Evidence from Rapid Prosody Transcription

      2020, Journal of Phonetics
      Citation Excerpt :

      But we note that women are often claimed to possess, on average, higher levels of pragmatic skill than men (e.g., Baron-Cohen, Wheelwright, Hill, et al., 2001) and we speculate that gender in our dataset may be a proxy for variation in pragmatic skill not captured by the admittedly coarse AQ-based measure. Thus, while we are only beginning to understand the role that individual differences play, the results of our study add to a growing body of work showing that prominence perception reflects a complex integration of phonological knowledge, phonetic realization, and factors quite unrelated to any properties of a word’s pronunciation (Vanio & Järvikivi, 2006; Nenkova et al., 2007; Sridhar, Nenkova, Narayanan, & Jurafsky, 2008; Cole, Mo, & Hasegawa-Johnson, 2010; Luchkina, Puri, Jyothi, & Cole, 2015; Calhoun, Kruse-Va’ai, & Wollum, 2019, Cole et al., 2017; Turnbull et al., 2017; among others). The primary goal of the present study was to explore questions related to perception, but RPT has also been presented as an alternative to manual annotation by experts relying on a phonological system (Cole & Shattuck-Hufnagel, 2016; Cole et al., 2017), and our results bear on its use as such a “crowdsourcing” tool.

    • Exploiting the speech-gesture link to capture fine-grained prosodic prominence impressions and listening strategies

      2019, Journal of Phonetics
      Citation Excerpt :

      The temporal coupling between speech and gesture in prominence production is furthermore modulated by grammatically licensed prominence, i.e., the synchrony between speech and co-speech movements increases when new, unpredictable information is being uttered (Wagner & Bryhadyr, 2017). In addition to the obvious phonological categories such as lexical or phrasal stress, a wide range of structural linguistic features have been identified that may cue prosodic prominence, among these being phrasal position (Fougeron & Keating, 1997; Vainio & Järvikivi, 2006), information structure (Xu, 1999; Féry & Küglerér, 2008), coreference or givenness (Baumann & Riester, 2013), informativeness (Calhoun, 2010), relevance and predictability (Aylett & Turk, 2004; Watson, Arnold, & Tanenhaus, 2008), lexical class (Widera, Portele, & Wolters, 1997), and lexical and syllable frequency (Bell, Brenier, Gregory, Girand, & Jurafsky, 2009; Samlowski, 2016). Not only do listeners expect individual linguistic items to be prominent because of their semantic or pragmatic function, they also show a tendency to expect (and perceive) prominence patterns as rhythmic alternations (Dilley & McAuley, 2008; Niebuhr, 2009; Vogel, van de Vijver, Kotz, Kutscher, & Wagner, 2015).

    • Acoustic correlates of focus in Marathi: Production and perception

      2017, Journal of Phonetics
      Citation Excerpt :

      The role of other prosodic parameters—mainly intensity and segmental durations—has been studied much less. Vainio and Järvikivi (2006) in the course of studying the dependence of prominence perception on tonal features in Finnish, found that intensity has an effect and that speakers could exploit a perceptual trade-off between pitch height and intensity increase. That languages differ in the degree to which they exploit duration, F0 and intensity in production but also to some extent in perception was recognised in the course of a recent large study on information structure across languages (Andreeva, Barry, & Koreman, 2015).

    View all citing articles on Scopus

    Both authors have equally contributed to this paper.

    View full text