The AMI Meeting Transcription System: Progress and Performance

Hain, Thomas; Burget, Lukas; Dines, John; Garau, Giulia; Karafiat, Martin; Lincoln, Mike; Vepa, Jithendra; Wan, Vincent

doi:10.1007/11965152_37

Thomas Hain¹⁹,
Lukas Burget²⁰,
John Dines²¹,
Giulia Garau²²,
Martin Karafiat²⁰,
Mike Lincoln²²,
Jithendra Vepa²¹ &
…
Vincent Wan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

811 Accesses
22 Citations

Abstract

We present the AMI 2006 system for the transcription of speech in meetings. The system was jointly developed by multiple sites on the basis of the 2005 system for participation in the NIST RT’05 evaluations. The paper describes major developments such as improvements in automatic segmentation, cross-domain model adaptation, inclusion of MLP based features, improvements in decoding, language modelling and vocal tract length normalisation, the use of a new decoder, and a new system architecture. This is followed by a comprehensive description of the final system and its performance in the NIST RT’06s evaluations. In comparison to the previous year word error rate results on the individual headset microphone task were reduced by 20% relative.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings

Comparison of Automatic Speech Recognition Systems

Everyday Conversations: A Comparative Study of Expert Transcriptions and ASR Outputs at a Lexical Level

References

Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., McCowan, I., Moore, D., Wan, V., Ordelman, R., Renals, S.: The development of the AMI system for the transcription of speech in meetings. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 344–356. Springer, Heidelberg (2006)
Chapter Google Scholar
Hain, T., Dines, J., Garau, G., Karafiat, M., Moore, D., Wan, V., Ordelman, R., Renals, S.: Transcription of conference room meetings: an investigation. In: Proc. Interspeech 2005 (2005)
Google Scholar
Stolcke, A., Anguera, X., Boakye, K., Cetin, O., Grezl, F., Janin, A., Manda, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The icsi-sri spring 2005 speech-to-text evaluation system. In: Proc. NIST RT 2005 Workshop (2005)
Google Scholar
Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., McCowan, I., Moore, D., Wan, V., Ordelman, R., Renals, S.: The 2005 AMI system for the transcription of speech in meetings. In: Proc. NIST RT 2005 Workshop, Edinburgh (2005)
Google Scholar
Schwarz, P., Matìjka, P., Cernocký, J.: Towards lower error rates in phoneme recognition. In: Proc. of 7th Intl. Conf. on Text, Speech and Dialogue, p. 8. Springer, Brno (2004)
Google Scholar
Gales, M.J.F.: Linear transformations for hmm-based speech recognition. Technical Report CUED/F-INFENG/TR-291, Cambridge University Engineering Department (1997)
Google Scholar
Messerschmitt, D., Hedberg, D., Cole, C., Haoui, A.: P.Winship: Digital voice echo canceller with a tms32020. Application report SPRA129, Texas Instruments (1989)
Google Scholar
Wrigley, S., Brown, G., Wan, V., Renals, S.: Speech and crosstalk detection in multichannel audio. IEEE Trans.Speech and Audio Processing 13(1), 84–91 (2005)
Article Google Scholar
Zhu, Q., Chen, A.S.B., Morgan, N.: Using MLP features in sri’s conversationl speech recognition system. In: Proc. Interspeech 2005 (2005)
Google Scholar
Povey, D.: Discriminative Training for Large Vocabulary Speech, Recognition. PhD thesis, Cambridge University (2004)
Google Scholar
Povey, D., Gales, M.J.F., Kim, D.Y., Woodland, P.C.: MMI-MAP and MPE-MAP for acoustic model adaptation. In: Proc. Eurospeech 2003 (2003)
Google Scholar
Bulyko, I., Ostendorf, M., Stolcke, A.: Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In: Proc. Human Language Technology Conference 2003 (2003)
Google Scholar
Wan, V., Hain, T.: Strategies for language model web-data collection. In: Proc. ICASSP 2006. Number SLP-P17.11 (2006)
Google Scholar
Moore, D., Dines, J., Doss, M.M., Vepa, J., Cheng, O., Hain, T.: Juicer: A weighted finite state transducer speech decoder. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 285–296. Springer, Heidelberg (2006)
Chapter Google Scholar
Mohri, M., Pereira, F., Riley, M.: General-purpose finite-state machine software tools. Technical report, AT&T Labs -Research (1997)
Google Scholar
Hetherington, L.: The mit fst toolkit. Technical report, L. Hetherington, The MIT FST toolkit, MIT Computer Science and Artificial Intelligence Laboratory (2005) (May 2005), http://people.csail.mit.edu/ilh/fst

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Sheffield, Sheffield, S1 4DP, UK
Thomas Hain & Vincent Wan
Faculty of Information Engineering, Brno University of Technology, Brno, 612 66, Czech Republic
Lukas Burget & Martin Karafiat
IDIAP Research Institute, CH-1920, Martigny, Switzerland
John Dines & Jithendra Vepa
Centre for Speech Technology Research, University of Edinburgh, Edinburgh, EH8 9LW, UK
Giulia Garau & Mike Lincoln

Authors

Thomas Hain
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Burget
View author publications
You can also search for this author in PubMed Google Scholar
John Dines
View author publications
You can also search for this author in PubMed Google Scholar
Giulia Garau
View author publications
You can also search for this author in PubMed Google Scholar
Martin Karafiat
View author publications
You can also search for this author in PubMed Google Scholar
Mike Lincoln
View author publications
You can also search for this author in PubMed Google Scholar
Jithendra Vepa
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Wan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hain, T. et al. (2006). The AMI Meeting Transcription System: Progress and Performance. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_37

Download citation

DOI: https://doi.org/10.1007/11965152_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics