Elsevier

Speech Communication

Volume 33, Issues 1–2, January 2001, Pages 113-134
Speech Communication

Representational issues in annotation: Using the Australian map task corpus to relate prosody and discourse structure

https://doi.org/10.1016/S0167-6393(00)00072-8Get rights and content

Abstract

This paper reports part of an ongoing investigation of the interaction of prosody and discourse structure. A digital speech corpus (4 dialogues from the ANDOSL Australian map task corpus) was coded for prosodic structure (ToBI). Independently, two different coding systems for dialogue micro-structure were applied to the same corpus: the HCRC map task coding scheme (Carletta et al., 1996, 1997b) and the `Switchboard' version of the DRI/DAMSL scheme (Jurafsky et al., 1997). We investigated whether silent pause location and duration, intonational boundaries associated with Break Indices 3 and 4, as well as pitch range reset were significantly correlated with dialogue act boundaries as has been found for other varieties of English (e.g., Lehiste, 1975; Hirschberg and Nakatani, 1996; Silverman, 1987) and Dutch (Swerts, 1997). The dialogue coding systems were systematically evaluated both against one another and in terms of their correlation with the prosodic structure. The paper explores a number of methodological issues which arise in effectively comparing and relating structures from different domains of analysis across a large speech corpus. It also exemplifies the way in which annotated corpora can be used to evaluate theories and systems.

Introduction

Recent years have seen the appearance of a plethora of large speech databases and at the same time a growing impetus towards the development of computer dialogue management systems. The former has encouraged the development of intonational and prosodic coding tools such as Tones and Break Indices (ToBI) (Beckman and Ayers Elam, 1994/1997), while the latter has seen significant attention being paid to dialogue annotation schemes. The study reported in this paper is a pilot for a larger ongoing study investigating the interaction of prosody and discourse structure and making use of one such large speech database: the ANDOSL Australian map task corpus (Millar et al., 1994).

While ToBI has become one standard for prosodic annotation of English, many different dialogue annotation systems are in existence. For example, Cooper et al. (1999) in their review of such systems mention five major ones: the HCRC scheme; DRI/DAMSL; the Linköping University system (Ahrenberg et al., 1995); the TRAINS system (Traum and Hinkelman, 1992) and the `GBG- IM' scheme (Allwood et al., 1994) (see also (Traum, 1998) for a survey of existing systems). The desire for a standard framework exhibiting a high level of cross-coder reliability is reflected in the activities of the discourse resource initiative (DRI), which has as one of its goals the development of a general-purpose scheme for coding dialogue acts and higher level dialogue structure (Carletta et al., 1997a, Core et al., 1999).

A number of recent studies have examined the interaction between various correlates of prosodic structure and discourse segmentation (e.g., Hirschberg and Nakatani, 1996, Swerts, 1997, Grice et al., 1995) or have sought to use various acoustic parameters associated with prosodic structure such as duration, F0, pause length and speaking rate to automatically classify utterances as specific kinds of dialogue acts (e.g., Shriberg et al., 1998). Some studies, for example, have found a close (although not perfect) correlation between final lengthening and the presence of a low or high boundary tone, and discourse segment strength (e.g., Swerts, 1997). Similarly, Nakatani et al. (1995) and Grosz and Hirschberg (1992) have shown that transcribers can reliably identify discourse segment boundaries (77.3–91.7% agreement among transcribers) and that acoustic and prosodic features are often associated with discourse segment boundaries.

In the pilot study reported here we independently applied a range of dialogue coding systems to the same corpus of dialogues. By applying the coding systems to the same dialogues, we were able to evaluate them both against one another (in terms of the kinds of information they were able to capture) and in terms of their correlation with prosodic phrase boundaries.

In this paper we report the results for coding discourse structure at the micro-level only (dialogue acts or moves at roughly the level of the turn or below). The two dialogue act coding schemes used were the HCRC map task coding scheme (Carletta et al., 1996, Carletta et al., 1997b, Isard and Carletta, 1995) and the DRI/DAMSL scheme (Allen and Core, 1997, Jurafsky et al., 1997, Cooper et al., 1999). Coding for dialogue structure at `meso-' and `macro-' levels has also been completed for the corpus considered and will be reported in future publications (cf. Carletta et al., 1997b; Nakatani and Traum, 1999).

The general research problem we address in this paper is the correspondence (if any) between prosodic features and dialogue acts. We examine correlations between dialogue acts and different kinds of prosodic constituents and phonetic parameters such as pause duration and fundamental frequency variation between prosodic phrases. We report two, more specific, methodological issues which arise in addressing this problem. First, how can we empirically compare dialogue act coding systems? Second, of what value can a pilot study making such a comparison be in deciding on an appropriate coding system to use in annotating a larger corpus for a specific purpose such as this?

Section snippets

The ANDOSL map task

Dialogues from the MAP TASK section of the Australian National Database of Spoken Language-ANDOSL (Millar et al., 1994) formed the corpus for this study. The ANDOSL corpus includes 216 dialogues from native speakers of Australian English of various ages who have been classified as belonging to either of the three main dialectal groupings for Australian English: broad, general, and educated/cultivated. The ANDOSL map task is closely modelled on the HCRC map task (Anderson et al., 1991).

Actual mappings between HCRC and DAMSL categories

For each HCRC move label, the types and frequencies of association with DAMSL dialogue act labels were calculated.

There were 1010 HCRC labelled moves in total, and 1201 DAMSL labelled dialogue acts (a detailed account of their distribution is given below when associations with Break Indices are discussed). Because the DAMSL system generally offers a wider range of labels to choose from, as well as a finer granularity in applying labels to the dialogue, there were numerous cases in which a

Mappings between HCRC and DAMSL categories

Table 12, Table 13 summarise the actual mappings between the HCRC categories and the pooled DAMSL categories of Table 6, as compared to the predicted mappings detailed in Table 3, Table 4 in the Introduction (Table 12, Table 13 thus present a summary of the information in Table 6). A major category mapping is one where the DAMSL category accounts for more than 10% of the HCRC category. A minor category mapping is one where the DAMSL category accounts for 10% or less of the HCRC category. As in

Conclusion

One of the major methodological issues for work on large speech databases is that of how to choose a coding scheme or `tune' a scheme to a domain. We conclude from the work reported above that considerable research effort can be saved by undertaking a pilot study of this kind in deciding on the kinds of coding systems to be used for particular domains of speech data and particular research questions. Furthermore, although the results reported are limited in the sense that only four dialogues

Acknowledgements

We are grateful to Jonathan Harrington and three anonymous reviewers for helpful comments on an earlier draft of this paper.

References (29)

  • J. Harrington et al.

    The mu+ system of database analysis

    Computer, Speech, and Language

    (1993)
  • Ahrenberg, L., Dahlbäck, N., Jonsson, A., 1995. Coding schemes for studies of natural language dialogue. In: Working...
  • Allwood, J., Nivre, J., Ahlsn, E., 1994. Semantics and spoken language: Manual for coding interaction management....
  • Allen, J., Core, M., 1997. Draft of DAMSL: Dialog Act Markup in Several Layers. Draft contribution for the Discourse...
  • A. Anderson et al.

    The HCRC map task corpus

    Language and Speech

    (1991)
  • Beckman, M.E., Ayers Elam, G., 1994/1997. Guide to ToBI Labelling – Version 3.0. Electronic text and accompanying audio...
  • Beckman, M.E., Pierrehumbert, J., 1986. Intonational Structure in English and Japanese. Phonology Yearbook, Vol. 3, pp....
  • Carletta, J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., Anderson, A., 1996. HCRC dialogue structure coding...
  • Carletta, J., Dahlbäck, N., Reithinger, N., Walker, M. (Eds.), 1997a. Standards for dialogue coding in natural language...
  • J. Carletta et al.

    The reliability of a dialogue structure coding scheme

    Computational Linguistics

    (1997)
  • Cooper, R., Larsson, S., Matheson, C., Poesio, M., Traum, D., 1999. Coding instructional dialogue for information...
  • Core, M., Ishizaki, M., Moore, J., Nakatani, C., Reithinger, N., Traum, D., Tutiya, S., 1999. The report of the third...
  • Fletcher, J., Harrington, J., 1996. Timing of intonational events in Australian English. In: McCormack, P., Russell, A....
  • Grice, M., Savino, M., 1995. Intonation and communicative function in a regional variety of Italian. In: Phonus 1....
  • Cited by (15)

    View all citing articles on Scopus
    View full text