The 2007 AMI(DA) System for Meeting Transcription

Hain, Thomas; Burget, Lukas; Dines, John; Garau, Giulia; Karafiat, Martin; van Leeuwen, David; Lincoln, Mike; Wan, Vincent

doi:10.1007/978-3-540-68585-2_39

The 2007 AMI(DA) System for Meeting Transcription

Thomas Hain¹,
Lukas Burget²,
John Dines³,
Giulia Garau⁴,
Martin Karafiat²,
David van Leeuwen⁵,
Mike Lincoln^4,3 &
…
Vincent Wan¹

Conference paper

1263 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4625))

Abstract

Meeting transcription is one of the main tasks for large vocabulary automatic speech recognition (ASR) and is supported by several large international projects in the area. The conversational nature, the difficult acoustics, and the necessity of high quality speech transcripts for higher level processing make ASR of meeting recordings an interesting challenge. This paper describes the development and system architecture of the 2007 AMIDA meeting transcription system, the third of such systems developed in a collaboration of six research sites. Different variants of the system participated in all speech to text transcription tasks of the 2007 NIST RT evaluations and showed very competitive performance. The best result was obtained on close-talking microphone data where a final word error rate of 24.9% was obtained.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fiscus, J.: Spring 2007 (RT-07) Rich Transcription Meeting Recognition Evaluation Plan. U.S. NIST (2007)
Google Scholar
Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., McCowan, I., Moore, D., Wan, V., Ordelman, R., Renals, S.: The development of the AMI system for the transcription of speech in meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)
Google Scholar
Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., McCowan, I., Moore, D., Wan, V., Ordelman, R., Renals, S.: The 2005 AMI system for the transcription of speech in meetings. In: Proc. NIST RT 2005, Edinburgh (2005)
Google Scholar
Fitt, S.: Documentation and user guide to UNISYN lexicon and post-lexical rules. Technical report, Centre for Speech Technology Research, Edinburgh (2000)
Google Scholar
Burget, L.: Combination of speech features using smoothed heteroscedastic linear discriminant analysis. In: Proc. ICSLP, Jeju Island, Korea, pp. 4–7 (2004)
Google Scholar
Povey, D.: Discriminative Training for Large Vocabulary Speech, Recognition. PhD thesis, Cambridge University (2004)
Google Scholar
Gales, M.J., Woodland, P.: Mean and variance adaptation within the mllr framework. Computer Speech & Language 10, 249–264 (1996)
Article Google Scholar
Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., Vepa, J., Wan, V.: The ami meeting transcription system: Progress and performance. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 419–431. Springer, Heidelberg (2006)
Chapter Google Scholar
Cieri, C., Miller, D., Walker, K.: The fisher corpus: a resource for the next generations of speech-to-text. In: LREC 2004: Fourth International Conference on Language Resources and Evaluatio, Lisbon (2004)
Google Scholar
Carletta, J., Ashby, S., Bourban, S., Guillemot, M., Kronenthal, M., Lathoud, G., Lincoln, M., McCowan, I., Hain, T., Kraaij, W., Post, W., Kadlec, J., Wellner, P., Flynn, M., Reidsma, D.: The AMI meeting corpus. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869. Springer, Heidelberg (2006)
Chapter Google Scholar
van Leeuwen, D.A., Huijbregts, M.: The ami speaker diarization system for nist rt06s meeting data. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 371–384. Springer, Heidelberg (2006)
Chapter Google Scholar
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI meeting corpus. In: Proceedings IEEE ICASSP (2003)
Google Scholar
Garofolo, J., Laprun, C., Miche, M., Stanford, V., Tabassi, E.: The nist meeting room pilot corpus. In: Proc. LREC 2004 (2004)
Google Scholar
Burger, S., MacLaren, V., Yu, H.: The ISL meeting corpus: The impact of meeting type on speech style. In: Proc. ICSLP (2002)
Google Scholar
Schwarz, P., Matějka, P., Černocký, J.: Hierarchical structures of neural networks for phoneme recognition. In: IEEE ICASSP (accepted, 2006)
Google Scholar
Karafiat, M., Burget, L., Hain, T., Cernocky, J.: Application of cmllr in narrow band wide band adapted systems. In: Proc 8th international conference INTERSPEECH 2007, Antwerp, p. 4 (2007)
Google Scholar
Grezl, F., Karafiat, M., Kontar, S., Cernocky, J.: Probabilistic and bottle-neck features for lvcsr of meetings. In: Proc. ICASSP, vol. 4, pp. IV–757–IV–760 (2007)
Google Scholar
Wan, V., Hain, T.: Strategies for language model web-data collection. In: Proc. ICASSP 2006. Number SLP-P17.11 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sheffield, Sheffield, S1 4DP, UK
Thomas Hain & Vincent Wan
Faculty of Information Engineering, Brno University of Technology, Brno, 612 66, Czech Republic
Lukas Burget & Martin Karafiat
IDIAP Research Institute, CH-1920, Martigny, Switzerland
John Dines & Mike Lincoln
Centre for Speech Technology Research, University of Edinburgh, Edinburgh, EH8 9LW, UK
Giulia Garau & Mike Lincoln
TNO, 2600 AD, Delft, The Netherlands
David van Leeuwen

Authors

Thomas Hain
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Burget
View author publications
You can also search for this author in PubMed Google Scholar
John Dines
View author publications
You can also search for this author in PubMed Google Scholar
Giulia Garau
View author publications
You can also search for this author in PubMed Google Scholar
Martin Karafiat
View author publications
You can also search for this author in PubMed Google Scholar
David van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar
Mike Lincoln
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Wan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hain, T. et al. (2008). The 2007 AMI(DA) System for Meeting Transcription. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-540-68585-2_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics