Elsevier

Speech Communication

Volume 33, Issue 4, March 2001, Pages 305-318
Speech Communication

Accentuation boundaries in Dutch, French and Swedish

https://doi.org/10.1016/S0167-6393(00)00062-5Get rights and content

Abstract

This paper presents a comparative study investigating the relation between the timing of a rising or falling pitch movement and the temporal structure of the syllable it accentuates for three languages: Dutch, French and Swedish. In a perception experiment, the five-syllable utterances /mamamamama/ and /ʔaʔaʔaʔaʔa/ were provided with a relatively fast rising or falling pitch movement. The timing of the movement was systematically varied so that it accented the third or the fourth syllable. Subjects were asked to indicate which syllable they perceived as accented. The accentuation boundary (AB) between the third and the fourth syllable was then defined as the moment before which more than half of the subjects indicated the third syllable as accented and after which more than half of the subjects indicated the fourth syllable. The results show that there are significant differences between the three languages as to the location of the AB. In general, for the rises, well-defined ABs were found. They were located in the middle of the vowel of the third syllable for French subjects, and later in that vowel for Dutch and Swedish subjects. For the falls, a clear AB was obtained only for the Dutch and the Swedish listeners. This was located at the end of the third syllable. For the French listeners, the fall did not yield a clear AB. This corroborates the absence of accentuation by means of falls in French. By varying the duration of the pitch movement it could be shown that, in all cases in which a clear AB was found, the cue for accentuation was located at the beginning of the pitch movement.

Zusammenfassung

In diesem Artikel wird eine vergleichende Studie zwischen den drei Sprachen Niederländisch, Französisch und Schwedisch presentiert, in der das Verhältnis zwischen dem zeitlichen Verlauf einer steigenden oder fallenden Pitchbewegung und der zeitlichen Struktur der Silbe, die dadurch akzentuiert wird untersucht wird. In einem Perzeptionsexperiment wurden die fünfsilbigen Äusserungen /mamamamama/ und /ʔaʔaʔaʔaʔa/ mit relativ schnell steigenden oder fallenden Pitchbewegungen präsentiert. Der zeitliche Verlauf der Bewegung wurde systematisch variiert, sodass die dritte oder vierte Silbe den Akzent bekam. Versuchspersonen wurden gebeten die Silbe, die sie als akzentuiert wahrgenommen hatten zu kennzeichnen. Eine Akzentuierungsgrenze (AB) zwischen der dritten und der vierten Silbe wurde schliesslich als der Augenblick definiert, vor dem über die Hälfte der Versuchpersonen die dritte Sible als akzentuiert angaben und nach dem über die Hälfte der Versuchspersonen die vierte Silbe als akzentuiert angaben. Die Ergebnisse zeigen, dass es signifikante Unterschiede zwischen den drei Sprachen bezüglich der Platzierung von AB gibt. Allgemein wurden deutlich definierte Abs für die steigenden Bewegungen gefunden. Diese befanden sich bei den französischen Versuchspersonen in der Mitte des Voklas der dritten Silbe und bei den niederländischen und schwedischen Versuchspersonen zu einem späteren Zeitpunkt in demselben Vokal. Dieser Zeitpunkt befant sich am Ende der dritten Silbe. Die französischen Versuchspersonen zeigten keinen deutlichen AB bei einer fallenden Bewegung. Dies bestẗigt, dass Akzentuierung im Französischen nicht durch eine fallende Pitchbewegung gekennzeichnet wird. Anhand der Variierung von Pitchbewegungsdauern konnte gezeigt werden, dass in allen Fällen mit deutlichen ABs sich das Kennzeichen für die Akzentuierung am Anfang der Pitchbewegung befand.

Résumé

Cet article présente une étude comparative de la relation entre le timing d'un mouvement mélodique montant ou descendant et la syllabe qu'il accentue pour trois langues: le néerlandais, le français et le suédois. Dans une expérience perceptive, des stimuli de 5 syllabes /mamamamama/ et /ʔaʔaʔaʔaʔa/ furent synthétisés avec un mouvement mélodique montant ou descendant relativement rapide. Le timing du mouvement fut systématiquement varié de manière à accentuer la troisième ou la quatrième syllabe. Il était demandé aux sujets d'indiquer la syllabe perçue comme accentuée. La frontière d'accentuation (AB) entre la troisième et la quatrième syllabe est définie comme le moment avant lequel plus de la moitié des sujets indiquent la troisième syllabe comme accentuée et après lequel plus de la moitié indiquent la quatrième. Les résultats montrent qu'il existe des différences significatives entre les trois langues concernant la position de la AB. En général, pour les mouvements mélodiques montants, les ABs sont clairement définies. Elles sont situées au milieu de la voyelle de la troisième syllabe pour les sujets français, et plus tard dans la voyelle pour les sujets suédois et néerlandais. En ce qui concerne les mouvements mélodiques descendants, il est obtenu une AB clairement définie uniquement pour les sujets suédois et néerlandais. Elle est située à la fin de la troisième syllabe. Pour les sujets français en revanche, ce type de mouvement ne permet pas de définir une AB. Cela confirme l'absence d'accentuation par mouvements mélodiques descendants en français. En faisant varier la durée du mouvement mélodique, il a pu être montré (dans les cas ou la `AB' est clairement définie), que la notion d'accentuation est liée à la perception d'une variation mélodique située au début du mouvement.

Introduction

Temporal alignment between prosodic and segmental structure can be defined in different ways. First, temporal alignment can be defined as the synchrony between events, i.c. tonal events and segmental events. Second, temporal alignment can be defined as overlap between segments (Sagey, 1988), e.g., part of the pitch movement and the voiced segment of the syllable. Also, alignment can more plastically be described as a hook and an eye (Caspers, 1994, p. 84), i.e., some aspect of the pitch movement, the hook, is attached to some aspect of the syllable, the eye. This last possibility implicates that there is a fixed temporal relationship between the tonal and the segmental event. This relationship need not necessarily be a synchrony or an overlap, and can be looser, in the sense that the tonal event occurs somewhere near the accented word, or stricter, in the sense that the tonal event is precisely timed with respect to a well-defined segmental position such as the vowel onset or the end of voicing.

In the following paragraphs, a short summary of the accentuation structure of Dutch, Swedish and French will be presented. For Dutch rises, 't Hart et al. (1991) distinguish two kinds of accent-lending movements: an early rise starting before the vowel onset and a late rise starting after the vowel onset. Similar results are obtained for English (Hill and Reid, 1977). These categories may be associated with L+H* and L*+H in autosegmental terminology (Beckman and Pierrehumbert, 1986). In a 2-alternative forced-choice experiment, Hermes (1997) showed that Dutch subjects are well capable of indicating an accent lent by a rise as high or low depending on whether the rise starts earlier or later in the syllable. This so-called high-low boundary was located just after the vowel onset.

For falls, 't Hart et al. (1991) present only one phonetic category of full-sized accent-lending falls, while Rietveld and Gussenhoven (1995) distinguish two phonological categories H*L and !H*L for Dutch. In the same 2-alternative forced-choice experiment as discussed for the rises, Hermes (1997) showed that Dutch subjects are well capable of indicating an accent lent by a fall as high or low depending on whether the fall starts earlier or later in the syllable. Now, the high-low boundary was again located just after the vowel onset, but the position was somewhat earlier than that of the rise. Caspers and Van Heuven (personal communication) showed indeed that, for Dutch, two linguistically different falls can be distinguished.

For Swedish, the rise is coupled to focal accent and is phonologically separate from the two different word-accent falls which are described as H+L* (acute accent) and H*+L (grave accent) (Bruce, 1977).

For French, only a small number of studies have investigated the timing of the accent-lending pitch movement in relationship with the vowel onset (Vaissière, 1980, Beaugendre, 1994). These studies claim that falls do not have a clear function in the accentual structure of French. It can thus be predicted that French subjects would have difficulties to define a clear category as to the location of the accent for falling movements. For rises, the melodic variation for a primary accent is defined as a late rise (Rf or R1) and the melodic variation for a secondary accent as an early rise (Ri or R2) (labels correspond to the notation of Vaissière (1980) and Beaugendre (1994), respectively).

These different categories of accent-lending falls and rises are related to a model of optimal tonal perception proposed by House (1990) who investigated the importance of segmental spectral information for the perception of tonal movement. In the model, tonal movement early in the vowel (through areas of changing spectral and intensity characteristics) is coded as level features and perceived as a pitch jump from a preceding level. Movement later in the vowel (through areas of spectral stability) is coded as contour features and perceived as pitch movement.

Temporal alignment is important since several studies account for classifications of pitch movements on the basis of their temporal alignment in the syllable. An overview of the relevant literature is given in (Hermes, 1997), who extended the pioneering work on Dutch accentuation carried out at IPO in the late 1970s by van Katwijk and Govaert (1967) and Collier (1970).

This recent work on accentuation in Dutch (Hermes, 1997; Hermes et al., 1997b) has shown that, first, the main cue which induces the percept of accentuation is located at the onset of a pitch movement. Second, the location where the percept of accentuation shifts from one syllable to the next is located near the vowel offset, or, when the vowel is followed by a voiced consonant, somewhat later. On the basis of the new results, most of the phenomena found in the early IPO studies could be understood.

In this paper, the same kind of experiments reported for Dutch subjects in (Hermes, 1997) are conducted, but now native speakers of two other Western European languages are also included: French and Swedish. Furthermore, not only reiterant speech stimuli consisting of the syllable /ma/ are used as in (Hermes, 1997), but also reiterant speech stimuli consisting of the syllable /ʔa/. These last stimuli were included in order to guarantee that the pitch information is limited to within the vowel of the syllable.

In the perception experiment used for the present study, five-syllable utterances of the type /mamamamama/ and /ʔaʔaʔaʔaʔa/ were provided with a relatively fast rising or falling pitch movement. The timing of the pitch movements was systematically varied in such a way that they accented the third or the fourth syllable. Subjects were asked to indicate which syllable they perceived as accented. The accentuation boundary (AB) was then defined as the moment before which more than half of the subjects indicated the third syllable as accented and after which more than half of the subjects indicated the fourth syllable. In order to find out where in the pitch movement the cue which induces the percept of accentuation is located (at the beginning, the end or somewhere in between), the duration of the pitch movement was varied. If the pitch cue that induces the percept of accentuation is at the onset of a pitch movement, the location of the onset of the pitch movement at the AB will be independent of the duration of the pitch movement. However, the perceptual integration realized by the human ear means that the exact location of the pitch cue cannot be at the exact beginning of the movement, but must be slightly later. If the pitch movement gets longer, and the change in the course of pitch is less abrupt, the accent-lending cue will therefore be somewhat weaker. The start of the pitch movements then has to be slightly earlier in order to guarantee that the cue is audible at the AB. As a direct consequence of this, the offset of the pitch movement at the AB will accordingly shift to the right as the duration of the pitch movement gets longer. This hypothesis is illustrated in Fig. 1(a) for the rises and in Fig. 1(b) for the falls. The hypothetical AB is indicated by the circle in the upper panel, and by the dashed–dotted line in the lower panel. In the lower panel, the onsets are indicated with small circles and the offsets with crosses. If, on the contrary, the cue which accentuates the syllable is at the offset of the pitch movement, the offset of the pitch movement at the AB will be more or less independent of the duration of the movement. If the perceptual process which determines the moment at which the pitch movement has been completed has an integration time, the shift of the offset of the pitch movement at the AB might shift somewhat to the left as the pitch movement gets longer, but not to the right. The onset of the pitch movement at the AB will accordingly shift to the left as the duration of the pitch movement increases. The situation is illustrated in Fig. 1(d) for the rises and in Fig. 1(e) for the falls.

In order to find out to which segments of the syllable this AB is linked, the durations of the /m/'s in the /mamamamama/ stimuli and the silences between the /a/'s in the /ʔaʔaʔaʔaʔa/ stimuli were varied. These silences correspond with the closures coupled with the glottal stop ʔ. The rationale behind this experimental setup is described in more detail in (Hermes, 1997).

Finally, the choice of two different types of stimuli will allow us to check whether the nature of the segment between each vowel, /m/ (voiced consonant) versus `silence comprising a glottal stop' (unvoiced), will influence the location of the AB. Glottal stops were used instead of ordinary stops, because the ordinary stops are aspirated in English and German, which would make these stimuli too unnatural to be used for future study of these languages.

Section snippets

Stimuli

The experiments reported in the present paper were carried out with reiterant five-syllable utterances of the type /mamamamama/ and /ʔaʔaʔaʔaʔa/, where /ʔ/ indicates a silence comprising a glottal stop. The middle three syllables were exact replicates of each other as to duration and spectral content. Those stimuli were derived from a natural three-syllable utterance /mamama/ and /ʔaʔaʔa/, spoken with an accent on the second syllable. The middle syllable was triplicated and the original pitch

Results

The results for the /mamamamama/ stimuli with an /m/ of normal duration are presented on the left-hand side of Fig. 2 for the Swedish subjects, and on the right-hand side of Fig. 2 for the French subjects. In (a) and (d) the range of pitch movements of the stimuli is indicated: in the left panels for the rise and in the right panels for the fall. This is done for the pitch movement of 120 ms. In (b) and (e) the number of responses `3rd syllable accented' is presented as a function of the onset

Discussion

The first question we asked was whether also in Swedish and French, as in Dutch, the cue for accentuation of an accent-lending pitch movement is at the onset of the movement. The locations of the 80, 120 and 160 ms movements for long, normal and short durations of `m' and silence in glottal stop are illustrated in Fig. 4, Fig. 5, Fig. 6, respectively. Referring to the two alternative hypotheses illustrated in Fig. 1, this question can be answered positively. Indeed, with the exception of the

Acknowledgements

We would like to thank our subjects from the LIMSI-CNRS, the Center for Research on User–System Interaction and the Lund University for their patience and kindness in participating in these experiments. We would also like to thank Dr. Bernd Möbius and an anonymous reviewer for their critical reading of the previous version of this manuscript. Part of this work was conducted under a research training grant from the European Commission.

References (25)

  • D. Hill et al.

    An experiment on the perception of intonational features

    International Journal of Man–Machines Studies

    (1977)
  • T. Rietveld et al.

    Aligning pitch targets in speech synthesis: Effects of syllable structure

    Journal of Phonetics

    (1995)
  • Beaugendre, F., 1994. Une étude perceptive de l'intonation du français. Ph.D. Thesis, University of Paris XI, Orsay,...
  • M. Beckman et al.

    Intonational structure in Japanese and English

    Phonology Year book

    (1986)
  • Bruce, G., 1977. Swedish Word Accents in Sentence Perspective. Gleerup, Lund,...
  • Bruce, G., Granström, B., House, D., 1992. Prosodic phrasing in Swedish speech synthesis. In: Bailly, G., Benoit C....
  • Bruce, G., Granström, B., Gustafson, K., House, D., 1993. Phrasing strategies in prosodic parsing and speech synthesis....
  • Caspers, J., 1994. Pitch movements under time pressure. Effects of speech rate on the melodic marking of accents and...
  • Collier, R., 1970. The optimum position of prominence lending pitch rises. IPO Annual Progress Report 5, Eindhoven, pp....
  • Hamon, C., Moulines, E., Charpentier, F., 1989. A diphone synthesis system based on time-domain prosodic modifications...
  • D.J. Hermes

    Timing of pitch movements and accentuation of syllables in Dutch

    Journal of Acoustical Society of America

    (1997)
  • Hermes, D.J., Beaugendre, F., House, D., 1997a. Individual differences in accentuation boundaries in Dutch. IPO Annual...
  • 1

    Currently at Lernout & Hauspie Speech Products, Koning Albert-I Laan 64, 1780 Wemmel, Belgium.

    2

    The work was carried out at the Lund University and the University of Skovde, Sweden.

    View full text