Abstract
We present the design and results of the Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation; the fourth in a series of community-wide evaluations of language technologies in the meeting domain. For 2006, we supported three evaluation tasks in two meeting sub-domains: the Speech-To-Text (STT) transcription task, and the “Who Spoke When” and “Speech Activity Detection” diarization tasks. The meetings were from the Conference Meeting, and Lecture Meeting sub-domains. The lowest STT word error rate, with up to four simultaneous speakers, in the multiple distant microphone condition was 46.3% for the conference sub-domain, and 53.4% for the lecture sub-domain. For the “Who Spoke When” task, the lowest diarization error rates for all speech were 35.8% and 24.0% for the conference and lecture sub-domains respectively. For the “Speech Activity Detection” task, the lowest diarization error rates were 4.3% and 8.0% for the conference and lecture sub-domains respectively.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fiscus, et al.: Results of the Fall 2004 STT and MDE Evaluation. In: RT-04F Evaluation Workshop Proceedings, November 7–10 (2004)
Garofolo, et al.: The Rich Transcription 2004 Spring Meeting Recognition Evaluation. In: ICASSP 2004 Meeting Recognition Workshop, May 17 (2004)
Spring 2006 (RT-06S) Rich Transcription Meeting Recognition Evaluation Plan (2006), http://www.nist.gov/speech/tests/rt/rt2006/spring/
LDC Meeting Recording Transcription, http://www.ldc.upenn.edu/Projects/Transcription/NISTMeet
SCTK toolkit, http://www.nist.gov/speech/tools/index.htm
Michel, et al.: The NIST Meeting Room Phase II Corpus. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)
Fiscus, et al.: The Rich Transcription 2005 Spring Meeting Recognition Evaluation. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)
Fiscus, et al.: Multiple Dimension Levenshtein Distance Calculations for Evaluating Automatic Speech Recognition Systems During Simultaneous Speech. In: LREC 2006. Sixth International Conference on Language Resources and Evaluation (2006)
Gehrig, McDonough: Tracking Multiple Simultaneous Speakers with Probabilistic Data Association Filters. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)
Stanford, V.: The NIST Mark-III microphone array - infrastructure, reference data, and metrics. In: Proceedings International Workshop on Microphone Array Systems - Theory and Practice, Pommersfelden, Germany (2003)
http://isl.ira.uka.de/clear06/downloads/ClearEval_Protocol_v5.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fiscus, J.G., Ajot, J., Michel, M., Garofolo, J.S. (2006). The Rich Transcription 2006 Spring Meeting Recognition Evaluation. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_28
Download citation
DOI: https://doi.org/10.1007/11965152_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)