Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site

Çetin, Özgür; Shriberg, Elizabeth

doi:10.1007/11965152_19

Özgür Çetin¹⁹ &
Elizabeth Shriberg^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

786 Accesses
10 Citations

Abstract

We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosstalk, our ASR error analysis uses an approach that allows comparison of the same foreground speech with and without naturally occurring overlap, using a state-of-the-art meeting recognition system. We examine a total of 101 meetings. For analysis of ASR, we use 26 meetings from the NIST meeting transcription evaluations, and discover a number of interesting phenomena. First, overlaps tend to occur at high-perplexity regions in the foreground talker’s speech. Second, overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. Third, word error rate (WER) after overlaps is consistently lower than that before the overlap, apparently because the foreground speaker reduces perplexity shortly after being overlapped. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content were conducted on a set of 75 additional meetings collected and annotated at ICSI. These analyses reveal interesting relationships between overlap and dialog acts, as well as between overlap and “hot spots” (points of increased participant involvement). Finally, results from this larger data set show that individual speakers have widely varying rates of being overlapped.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ang, J., Liu, Y., Shriberg, E.: Automatic Dialog Act Segmentation and Classification in Multi-party Meetings. In: Proc. Intl. Conf. on Acoustic, Speech and Signal Processing, pp. 1061–1064 (2005)
Google Scholar
Clark, A., Popescu-Belis, A.: Multi-level Dialogue Act Tags. In: SIGdial Workshop on Discourse and Dialogue, pp. 163–170 (2004)
Google Scholar
Cooke, M., Ellis, D.P.W.: The Auditory Organization of Speech and Other Sources in Listeners and Computational Models. Speech Communication 35, 141–177 (2001)
Article MATH Google Scholar
Çetin, Ö., Stolcke, A.: Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System, Technical Report TR-05-006, ICSI (2005)
Google Scholar
Çetin, Ö., Shriberg, E.E.: Speaker Overlaps and ASR Errors in Meetings: Effects Before, During, and After the Overlap. In: Proc. Intl. Conf. on Acoustic, Speech and Signal Processing (2006)
Google Scholar
Dhillon, R., Bhagat, S., Carvey, H., Shriberg, E.: Meeting Recorder Project: Dialog Act Labeling Guide, Technical Report TR-04-002, ICSI (2004)
Google Scholar
Jefferson, G.: A Sketch of Some Orderly Aspects of Overlap in Natural Conversation. In: Lerner, G.H. (ed.) Conversation Analysis, pp. 43–59. John Benjamins, Amsterdam (2004)
Google Scholar
Ji, G., Bilmes, J.: Dialog Act Tagging Using Graphical Models. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Process, pp. 33–36 (2005)
Google Scholar
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Process, pp. 364–367 (2003)
Google Scholar
Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E., Stolcke, A.: The Meeting Project at ICSI. In: Proc. Human Language Technologies Conf., pp. 1–7 (2001)
Google Scholar
NIST Speech Evaluations, http://www.nist.gov/speech/tests/index.htm
Pfau, T., Ellis, D., Stolcke, A.: Multispeaker Speech Activity Detection for the ICSI Meeting Recorder. In: Proc. Automatic Speech Recognition and Understanding Workshop, pp. 107–110 (2001)
Google Scholar
Sacks, H., Schegloff, E., Jefferson, G.: A Simplest Semantics for the Organization of the Turn-taking in Conversation. Language 50, 696–735 (1974)
Article Google Scholar
Schegloff, E.: Recycled Turn Beginnings: A precise repair mechanism in conversation’s turn-taking organisation. In: Button, G., Lee, J.R.E. (eds.) Talk and Social Organisation, pp. 70–85. Clevadon (1987)
Google Scholar
Schegloff, E.: Overlapping Talk and the Organization of Turn-Taking for Conversation. Language in Society 29, 696–735 (2000)
Article Google Scholar
Schultz, R.T., Waibel, A., Bett, M., Metze, F., Pan, Y., Ries, K., Schaaf, T., Soltau, H., Westphal, M., Yu, H., Zechner, K.: The ISL Meeting Room System. In: Proc. Workshop on Hands-Free Speech Communication (2001)
Google Scholar
Shriberg, E., Stolcke, A., Baron, D.: Observations on Overlap: Findings and implications for automatic processing of multi-party conversation. In: Proc. European Conf. on Speech Communication and Technology, pp. 1359–1362 (2001)
Google Scholar
Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: Proc. 5th SIGdial Workshop on Discourse and Dialogue, pp. 97–100 (2004)
Google Scholar
Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System. In: Proc. NIST RT-05 Meeting Recognition Workshop (2005)
Google Scholar
Wrede, B., Bhagat, S., Dhillon, R., Shriberg, E.: Meeting Recorder Project: Hot Spot Labeling Guide, Technical Report TR-05-004, ICSI (2005)
Google Scholar
Wrigley, S., Brown, G., Wan, V., Renals, S.: Speech and Crosstalk Detection in Multi-channel Audio. IEEE Trans. on Speech and Audio Processing 13, 84–91 (2005)
Article Google Scholar
Zimmermann, M., Liu, Y., Shriberg, E., Stolcke, A.: A* based Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings. In: Proc. Automatic Speech Recognition and Understanding Workshop, pp. 215–219 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA, USA
Özgür Çetin & Elizabeth Shriberg
SRI International, Menlo Park, CA, USA
Elizabeth Shriberg

Authors

Özgür Çetin
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Shriberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Çetin, Ö., Shriberg, E. (2006). Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_19

Download citation

DOI: https://doi.org/10.1007/11965152_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics