The LIMSI RT07 Lecture Transcription System

Lamel, L.; Bilinski, E.; Gauvain, J. L.; Adda, G.; Barras, C.; Zhu, X.

doi:10.1007/978-3-540-68585-2_41

L. Lamel¹,
E. Bilinski¹,
J. L. Gauvain¹,
G. Adda¹,
C. Barras^1,2 &
…
X. Zhu^1,2

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4625))

Included in the following conference series:

Abstract

A system to automatically transcribe lectures and presentations has been developed in the context of the FP6 Integrated Project Chil. In addition to the seminar data recorded by the Chil partners, widely available corpora were used to train both the acoustic and language models. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as the ICSI, ISL, and NIST meeting corpora. For language model training, text materials were extracted from a variety of on-line conference proceedings. Experimental results are reported for close-talking and far-field microphones on development and evaluation data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Comparison of Automatic Speech Recognition Systems

Autoblog 2021: The Importance of Language Models for Spontaneous Lecture Speech

A Multi-layered Approach to Evaluating Speech Translation Performance of Meetings

References

The Translanguage English Database (TED) Transcripts, LDC catalog number LDC2002T03, ISBN 1-58563-202-3
Google Scholar
Anguera, X., Wooters, C., Hernando, J.: Speaker Diarization for Multi-Party Meetings Using Acoustic Fusion. In: Automatic Speech Recognition and Understanding (IEEE, ASRU 2005), San Juan, Puerto Rico (2005)
Google Scholar
Barras, C., Gauvain, J.-L.: Feature and score normalization for speaker verification of cellular data. In: IEEE ICASSP 2003, Hong Kong (2003)
Google Scholar
Barras, C., Zhu, X., Meignier, S., Gauvain, J.-L.: Multi-Stage Speaker Diarization of Broadcast News. The IEEE Transactions on Audio, Speech and Language Processing (September 2006)
Google Scholar
Burger, S., MacLaran, V., Yu, H.: The ISL Meeting Corpus: The Impact of Meeting Type on Speech Style. In: ICSLP 2002 (LDC2004S05, LDC2004E04, LDC2004E05), Denver (September 2002)
Google Scholar
Garofolo, J.S., Laprun, C.D., Michel, M., Stanford, V.M., Tabassi, E.: The NIST Meeting Room Pilot Corpus. In: LREC 2004 (LDC2004S09, LDC2004T13), May 2004, Lisbon (2004)
Google Scholar
Gauvain, J.L., Lamel, L., Adda, G.: The Limsi Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)
Article MATH Google Scholar
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: ICASSP 2003 (LDC2004S02, LDC2004T04), April 2003, Hong Kong (2003)
Google Scholar
Lamel, L., Adda, G., Bilinski, E., Gauvain, J.L.: Transcribing Lectures and Seminars. In: Proc. ISCA Eurospeech 2005, September 2005, Lisbon (2005)
Google Scholar
Lamel, L.F., Schiel, F., Fourcin, A., Mariani, J., Tillmann, H.: The Translanguage English Database TED. In: ICSLP 1994 (LDC2002S04), September 1994, Yokohama (1994)
Google Scholar
Lamel, L., Schwenk, H., Gauvain, J.L., Adda, G., Bilinski, E.: Improvements in Transcribing Lectures and Seminars. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, Springer, Heidelberg (2006)
Google Scholar
Lamel, L., Bilinski, E., Adda, G., Gauvain, J.L., Schwenk, H.: The LIMSI RT06s Lecture Transcription System. In: Proc. RT 2006s Workshop, May 2006, Washington DC, USA (2006)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Mangu, L., Brill, E., Stolcke, A.: Finding Consensus Among Words: Lattice-Based Word Error Minimization. In: Eurospeech 1999, September 1999, pp. 495–498 (1999)
Google Scholar
Schroeder, J., Campbell, J. (eds.): Digital Signal Processing (DSP), a review journal - Special issue on NIST 1999 speaker recognition workshop. Academic Press, London (2000)
Google Scholar
Woodland, P.C., Niesler, T., Whittaker, E.: Language Modeling in the HTK Hub5 LVCSR. In: The 1998 Hub5E Workshop (September 1998)
Google Scholar
Zhu, X., Barras, C., Lamel, L., Gyauvain, J.L.: Speaker Diarization: From Broadcast News to Lectures. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 396–406. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhu, X., Barras, C., Lamel, L., Gauvain, J.-L.: Multi-Stage Speaker Diarization for Conference and Lecture Meetings. In: Proc. NIST RT 2007, May 2007, Baltimore (2007)
Google Scholar
Zhu, X., Barras, C., Meignier, S., Gauvain, J.L.: Combining speaker identification and BIC for speaker diarization. In: Proc. Interspeech 2005, Lisbon, September 2005, pp. 2441–2444 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI-CNRS, BP 133, 91403, Orsay Cedex, France
L. Lamel, E. Bilinski, J. L. Gauvain, G. Adda, C. Barras & X. Zhu
Also with Univ Paris-Sud, F-91405, Orsay, France
C. Barras & X. Zhu

Authors

L. Lamel
View author publications
You can also search for this author in PubMed Google Scholar
E. Bilinski
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Gauvain
View author publications
You can also search for this author in PubMed Google Scholar
G. Adda
View author publications
You can also search for this author in PubMed Google Scholar
C. Barras
View author publications
You can also search for this author in PubMed Google Scholar
X. Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lamel, L., Bilinski, E., Gauvain, J.L., Adda, G., Barras, C., Zhu, X. (2008). The LIMSI RT07 Lecture Transcription System. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-68585-2_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics