Elsevier

Speech Communication

Volume 55, Issue 5, June 2013, Pages 721-743
Speech Communication

Resources for turn competition in overlapping talk

https://doi.org/10.1016/j.specom.2012.10.002Get rights and content

Abstract

Overlapping talk occurs frequently in multi-party conversations, and is a domain in which speakers may pursue various communicative goals. The current study focuses on turn competition. Specifically, we seek to identify the phonetic differences that discriminate turn-competitive from non-competitive overlaps. Conversation analysis techniques were used to identify competitive and non-competitive overlaps in a corpus of multi-party recordings. We then generated a set of potentially predictive features relating to prosody (F0, intensity, speech rate, pausing) and overlap placement (overlap duration, point of overlap onset, recycling etc.). Decision tree classifiers were trained on the features and tested on a classification task, in order to determine which features and feature combinations best differentiate competitive overlaps from non-competitive overlaps. It was found that overlap placement features played a greater role than prosodic features in indicating turn competition. Among the prosodic features tested, F0 and intensity were the most effective predictors of turn competition. Also, our decision tree models suggest that turn competitive and non-competitive overlaps can be initiated by a new speaker at many different points in the current speaker’s turn. These findings have implications for the design of dialogue systems, and suggest novel hypotheses about how speakers deploy phonetic resources in everyday talk.

Highlights

► We derive decision trees that discriminate turn-competitive and non-competitive overlapping talk. ► F0 and intensity were found to be the most important prosodic resources for turn competition. ► Competitive and non-competitive overlaps could be initiated at a range of different places in the current speaker’s turn. ► Overlap placement features played a greater role than prosodic features in indicating turn competition.

Introduction

People do not usually talk at the same time. Conversations seem to be based on well-organised turn exchange systems, in which speakers take turns and cooperate to achieve overlap-free interaction, estimated to occupy around 90% of total speaking time (e.g. Shriberg et al., 2001b, Cetin and Shriberg, 2006). Simultaneous speech by two or more speakers is, nevertheless, frequently observed. If, rather than total speaking time, we consider the number of speaker turns that are overlapped, the incidence of overlapping talk is much higher. For example, Heldner and Edlund (2010) estimate that 41–45% of all turn shifts between speakers in spontaneous conversational dyads contain overlap, and Shriberg et al. (2001b) report that 30–50% of all turn exchanges in multi-party meetings contain some overlap. This raises a number of questions about the status of overlapping speech in turn-taking: Why does overlap occur with such frequency? Is it an integral part of the turn-taking system, a by-product of otherwise one-speaker-at-a-time turn exchange? Or is it a conversational tool used by speakers to achieve certain communicative goals?

Most previous studies on turn taking and speaker overlap at least allow for the latter possibility, agreeing that overlapping talk is an environment in which turn competition may take place. It follows that some instances of speaker overlap will be turn competitive, while other overlaps will be non-competitive. This observation raises the question that the current study seeks to address: If overlapping talk is the domain of different communicative actions such as competing vs. not competing for the turn, how do conversation participants display these differences to one another? An answer to this question would enhance our understanding of how people deploy phonetic and linguistic resources in everyday talk, enabling us to address a number of important theoretical and practical questions. Are there interactional ‘universals’ in the management of overlapping talk or is it language (or culture) specific (c.f. Sidnell, 2001)? How might an answer to this question contribute to the study of intercultural communication? How do young children learn to manage turn-taking in general, and overlap in particular (c.f. Wells and Corrin, 2004)? What light might this shed on the interactional problems of individuals with communication difficulties, arising for example from autism or hearing loss?

An answer to our question might also contribute to improvements in speech technology. Reidsma et al. (2011) claim that differentiating between turn competitive and non-competitive overlapped incomings is an essential part of so-called ‘continuous conversation’ with a virtual agent. An automatic dialogue system needs know when to yield the turn to the human user, which also involves being able to deal with the cases when the human user takes the turn while the system is still talking. To achieve this, the system has to be able to recognise such incomings as turn-competitive and employ practices for management of turn-competitive incomings. On the other hand, the dialogue system should also be able to produce non-competitive overlaps such as response tokens (backchannels) at appropriate places to acknowledge receipt of the ongoing turn (Gravano and Hirschberg, 2011). Findings on the organisation of human overlap management, and in particular on differentiating between turn-competitive and non-competitive overlaps, could thus be a particularly important source of knowledge for automatic systems that aim at spontaneous conversation with human users.

The focus of the present study is solely on the acoustic and temporal features of overlap. We make no claims about participants’ use of non-verbal cues in the realisation of turn competitiveness since, for reasons given in Section 3 below we chose to work with the ICSI meeting corpus, for which only audio recordings exist. The role of gesture for conversational sequencing and the structuring of turn-taking has long been recognised and analysed (e.g. in Goodwin, 1980, Goodwin and Goodwin, 1986, Kendon, 1967, Bavelas et al., 2002, Barkhuysen et al., 2008). However, there has been little research specifically concerned with the relevance of non-verbal cues for turn competition in overlap, with the exception of two recent studies which support the view that gestures are relevant resources for overlap management in face-to-face discourse. Lee et al. (2008) show that adding hand movements to intensity analysis improves discrimination between turn-competitive and non-competitive overlaps in their corpus of acted scripted dialogues. In a study of French mundane conversations Mondada and Oloff (2011) show that continuing vs. abandoning gesturing during overlap is associated with how problematic participants take the overlap to be. These studies indicate that the role of gesture and gaze in relation to phonetic features in overlapping talk is a promising area for future research. However, it will be dependent on access to corpora where individual speakers are recorded on separate channels and where the video recordings provide sufficient detail of each participant’s gesture and gaze behaviour (e.g. Carletta, 2007, Kurtic et al., 2012).

The methodology of the present study draws on complementary traditions of research into overlapping talk: speech science, conversation analysis and interactional phonetics. First, we review the contribution of each of these traditions to the study of overlap. On the basis of that research, we identify a set of temporal, prosodic and other features that may be implicated in the design of overlapping talk. We describe how we constructed a collection of overlaps from naturally occurring, unscripted multi-party meetings and how these were classified as competitive or non-competitive. Decision tree analyses are then used to identify the role of prosodic and non-prosodic features in differentiating competitive from non-competitive overlaps. The resulting decision tree models enable us to make a number of empirically grounded, testable hypotheses about how human participants signal competition for the turn. Finally, we explore the theoretical and practical implications of our hypotheses.

Section snippets

Speech science research into overlapping talk

As indicated above, the speech technology community has an interest in understanding more about overlapping talk, in order to improve spoken dialogue systems for example. This has fuelled research into the acoustic and temporal properties of overlap. Shriberg et al. (2001b) carried out a quantitative study of overlaps from the ICSI corpus, described below. The study is fairly typical of speech science research into overlapping talk, in that the analysis is conducted on a large corpus of audio

Materials and methods

We employ a methodology in which acoustic and temporal features, including fundamental frequency, speech intensity, speech rate and pausing, are extracted from a large collection of turn-competitive and non-competitive overlaps. A machine learning technique – decision tree modelling – is then applied to analyse the relationship between these features and turn competition. This methodological approach differs from previous interactional phonetic studies in terms of the size of the collection of

Results

To identify the prosodic and overlap placement features that characterise turn competition, we first assess the utility of the prosodic and overlap placement feature sets as individual turn competitive resources, and then describe the potential interactions between these feature groups. According to the Shapiro–Wilk test, the null hypotheses that the data follow a normal distribution could be retained for all result sets. Consequently, in the following, all significant values are reported as

Discussion

In this study, we used decision tree analysis of a large corpus of conversational speech to investigate the resources that participants might employ and orient to when competing for the speaking turn. A wide range of features were extracted from the corpus of overlap instances, including both prosodic features (e.g. F0, intensity, speech rate) and those related to the placement of overlapping talk (duration, the position of overlap onset in the current speaker’s turn, and other phenomena

Conclusion

Researchers interested in overlapping talk, irrespective of disciplinary background, have recognised that there is a fundamental distinction between accidental overlap and deliberate overlap. The mechanisms that underlie accidental overlap have been of particular interest to researchers designing speech-based computer systems that interact with human speakers; for instance, some researchers have endeavoured to indentify properties of the turn in progress that might predict whether the next

Acknowledgements

The research reported here was supported by a University of Sheffield Project Studentship. Preparation of the article was facilitated by UK Arts and Humanities Research Council Grant 1-62874195. We are grateful to our annotators for their time and effort; to Ahmed Aker for invaluable assistance at various stages of the research; to Gareth Walker and John Local for their sustained interest and encouragement; and to Jens Edlund and an anonymous reviewer for their constructive comments on an

References (53)

  • J. Carletta

    Unleashing the killer corpus: Experiences in creating the multi-everything AMI meeting corpus

    Lang. Resour. Eval.

    (2007)
  • Cetin, O., Shriberg, E., 2006. Overlaps in meetings: ASR effects and analysis by dialogue factors, speakers, and...
  • E. Couper-Kuhlen

    English Speech Rhythm: Form and Function in Everyday Verbal Interaction

    (1993)
  • V. Dellwo et al.

    The perception of intended speech rate in English, French, and German by French speakers

  • Dhillon, R., Bhagat, S.H.C., Shriberg, E., 2004. Meeting recorder project: Dialog act labelling guide. Technical Report...
  • P. French et al.

    Turn-competitive incomings

    J. Pragmatics

    (1983)
  • R. Gardner

    When listeners talk: Response tokens and listener stance

  • C. Goodwin

    Restarts, pauses, and the achievement of a state of mutual gaze at turn-beginning

    Sociol. Inquiry

    (1980)
  • M. Goodwin et al.

    Gesture and coparticipation in the activity of searching for a word

    Semiotica

    (1986)
  • J. Gorisch et al.

    Pitch contour matching and interactional alignment across turns: An acoustic investigation

    Lang. Speech

    (2012)
  • T. Hain et al.

    Transcribing meetings with the AMIDA systems

    IEEE Trans. Audio Speech Lang. Process.

    (2012)
  • M. Heldner

    Detection thresholds for gaps, overlaps, and no-gap-no-overlaps

    J. Acoust. Soc. Amer.

    (2011)
  • Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A.,...
  • G. Jefferson

    Two explorations of the organisation of overlapping talk in conversation, 1: Notes on some orderliness of overlap onset

    (1983)
  • Jefferson, G., 1987. Notes on ‘latency’ in overlap onset. In: Button, G., Drew, P., Heritage, J. (Eds.), Interaction...
  • G. Jefferson

    A sketch of some orderly aspects of overlap in natural conversation

  • Cited by (0)

    View full text