Abstract
We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosstalk, our ASR error analysis uses an approach that allows comparison of the same foreground speech with and without naturally occurring overlap, using a state-of-the-art meeting recognition system. We examine a total of 101 meetings. For analysis of ASR, we use 26 meetings from the NIST meeting transcription evaluations, and discover a number of interesting phenomena. First, overlaps tend to occur at high-perplexity regions in the foreground talker’s speech. Second, overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. Third, word error rate (WER) after overlaps is consistently lower than that before the overlap, apparently because the foreground speaker reduces perplexity shortly after being overlapped. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content were conducted on a set of 75 additional meetings collected and annotated at ICSI. These analyses reveal interesting relationships between overlap and dialog acts, as well as between overlap and “hot spots” (points of increased participant involvement). Finally, results from this larger data set show that individual speakers have widely varying rates of being overlapped.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ang, J., Liu, Y., Shriberg, E.: Automatic Dialog Act Segmentation and Classification in Multi-party Meetings. In: Proc. Intl. Conf. on Acoustic, Speech and Signal Processing, pp. 1061–1064 (2005)
Clark, A., Popescu-Belis, A.: Multi-level Dialogue Act Tags. In: SIGdial Workshop on Discourse and Dialogue, pp. 163–170 (2004)
Cooke, M., Ellis, D.P.W.: The Auditory Organization of Speech and Other Sources in Listeners and Computational Models. Speech Communication 35, 141–177 (2001)
Çetin, Ö., Stolcke, A.: Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System, Technical Report TR-05-006, ICSI (2005)
Çetin, Ö., Shriberg, E.E.: Speaker Overlaps and ASR Errors in Meetings: Effects Before, During, and After the Overlap. In: Proc. Intl. Conf. on Acoustic, Speech and Signal Processing (2006)
Dhillon, R., Bhagat, S., Carvey, H., Shriberg, E.: Meeting Recorder Project: Dialog Act Labeling Guide, Technical Report TR-04-002, ICSI (2004)
Jefferson, G.: A Sketch of Some Orderly Aspects of Overlap in Natural Conversation. In: Lerner, G.H. (ed.) Conversation Analysis, pp. 43–59. John Benjamins, Amsterdam (2004)
Ji, G., Bilmes, J.: Dialog Act Tagging Using Graphical Models. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Process, pp. 33–36 (2005)
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Process, pp. 364–367 (2003)
Morgan, N., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Janin, A., Pfau, T., Shriberg, E., Stolcke, A.: The Meeting Project at ICSI. In: Proc. Human Language Technologies Conf., pp. 1–7 (2001)
NIST Speech Evaluations, http://www.nist.gov/speech/tests/index.htm
Pfau, T., Ellis, D., Stolcke, A.: Multispeaker Speech Activity Detection for the ICSI Meeting Recorder. In: Proc. Automatic Speech Recognition and Understanding Workshop, pp. 107–110 (2001)
Sacks, H., Schegloff, E., Jefferson, G.: A Simplest Semantics for the Organization of the Turn-taking in Conversation. Language 50, 696–735 (1974)
Schegloff, E.: Recycled Turn Beginnings: A precise repair mechanism in conversation’s turn-taking organisation. In: Button, G., Lee, J.R.E. (eds.) Talk and Social Organisation, pp. 70–85. Clevadon (1987)
Schegloff, E.: Overlapping Talk and the Organization of Turn-Taking for Conversation. Language in Society 29, 696–735 (2000)
Schultz, R.T., Waibel, A., Bett, M., Metze, F., Pan, Y., Ries, K., Schaaf, T., Soltau, H., Westphal, M., Yu, H., Zechner, K.: The ISL Meeting Room System. In: Proc. Workshop on Hands-Free Speech Communication (2001)
Shriberg, E., Stolcke, A., Baron, D.: Observations on Overlap: Findings and implications for automatic processing of multi-party conversation. In: Proc. European Conf. on Speech Communication and Technology, pp. 1359–1362 (2001)
Shriberg, E., Dhillon, R., Bhagat, S., Ang, J., Carvey, H.: The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. In: Proc. 5th SIGdial Workshop on Discourse and Dialogue, pp. 97–100 (2004)
Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Grezl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System. In: Proc. NIST RT-05 Meeting Recognition Workshop (2005)
Wrede, B., Bhagat, S., Dhillon, R., Shriberg, E.: Meeting Recorder Project: Hot Spot Labeling Guide, Technical Report TR-05-004, ICSI (2005)
Wrigley, S., Brown, G., Wan, V., Renals, S.: Speech and Crosstalk Detection in Multi-channel Audio. IEEE Trans. on Speech and Audio Processing 13, 84–91 (2005)
Zimmermann, M., Liu, Y., Shriberg, E., Stolcke, A.: A* based Joint Segmentation and Classification of Dialog Acts in Multiparty Meetings. In: Proc. Automatic Speech Recognition and Understanding Workshop, pp. 215–219 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Çetin, Ö., Shriberg, E. (2006). Overlap in Meetings: ASR Effects and Analysis by Dialog Factors, Speakers, and Collection Site. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_19
Download citation
DOI: https://doi.org/10.1007/11965152_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)