The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System

Stolcke, Andreas; Anguera, Xavier; Boakye, Kofi; Çetin, Özgür; Janin, Adam; Magimai-Doss, Mathew; Wooters, Chuck; Zheng, Jing

doi:10.1007/978-3-540-68585-2_42

Andreas Stolcke^1,2,
Xavier Anguera²,
Kofi Boakye²,
Özgür Çetin³,
Adam Janin²,
Mathew Magimai-Doss²,
Chuck Wooters² &
…
Jing Zheng¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4625))

Included in the following conference series:

1248 Accesses
17 Citations

Abstract

We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant microphones, and various adjustments to the speech segmenter for close-talking microphones. Acoustic models were improved by the combined use of neural-net-estimated phone posterior features, discriminative feature transforms trained with fMPE-MAP, and discriminative Gaussian estimation using MPE-MAP, as well as model adaptation specifically to nonnative and non-American speakers. The net effect of these enhancements was a 14-16% relative error reduction on distant microphones, and a 16-17% error reduction on close-talking microphones. Also, for the first time, we report results on a new “coffee break” meeting genre, and on a new NIST metric designed to evaluate combined speech diarization and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stolcke, A., Wooters, C., Mirghafori, N., Pirinen, T., Bulyko, I., Gelbart, D., Graciarena, M., Otterson, S., Peskin, B., Ostendorf, M.: Progress in meeting recognition: The ICSI-SRI-UW Spring 2004 evaluation system. In: Proceedings NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, National Institute of Standards and Technology (2004)
Google Scholar
Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Grézl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The ICSI-SRI Spring 2005 speech-to-text evaluation system. In: Proceedings of the Rich Transcription 2005 Spring Meeting Recognition Evaluation, Edinburgh, National Institute of Standards and Technology, pp. 39–50 (2005)
Google Scholar
Janin, A., Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Frankel, J., Zheng, J.: The ICSI-SRI Spring 2006 meeting recognition system. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 444–456. Springer, Heidelberg (2006)
Chapter Google Scholar
Zheng, J., Cetin, O., Hwang, M.Y., Lei, X., Stolcke, A., Morgan, N.: Combining discriminative feature, transform, and model training for large vocabulary speech recognition. In: Proc. ICASSP, Honolulu, vol. 4, pp. 633–636 (2007)
Google Scholar
Lamel, L., Schiel, F., Fourcin, A., Mariani, J., Tillman, H.: The translingual English database (TED). In: Proc. ICSLP, Yokohama, pp. 1795–1798 (1994)
Google Scholar
Adami, A., Burget, L., Dupont, S., Garudadri, H., Grezl, F., Hermansky, H., Jain, P., Kajarekar, S., Morgan, N., Sivadas, S.: Qualcomm-ICSI-OGI features for ASR. In: Hansen, J.H.L., Pellom, B. (eds.) Proc. ICSLP, Denver, vol. 1, pp. 4–7 (2002)
Google Scholar
Anguera, X., Wooters, C., Pardo, J.M.: Robust speaker diarization for meetings: ICSI-SRI RT-06S meetings evaluation system. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)
Google Scholar
Anguera, X.: Beamformit (the fast and robust acoustic beamformer) (2006), http://www.icsi.berkeley.edu/~xanguera/beamformit/
Boakye, K., Stolcke, A.: Improved speech activity detection using cross-channel features for recognition of multiparty meetings. In: Proc. ICSLP, Pittsburgh, PA, pp. 1962–1965 (2006)
Google Scholar
Vergyri, D., Stolcke, A., Gadde, V.R.R., Ferrer, L., Shriberg, E.: Prosodic knowledge sources for automatic speech recognition. In: Proc. ICASSP, Hong Kong, vol. 1, pp. 208–211 (2003)
Google Scholar
Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proc. ICASSP, Orlando, FL, vol. 1, pp. 105–108 (2002)
Google Scholar
Graciarena, M., Franco, H., Zheng, J., Vergyri, D., Stolcke, A.: Voicing feature integration in SRI’s Decipher LVCSR system. In: Proc. ICASSP, Montreal, vol. 1, pp. 921–924 (2004)
Google Scholar
Kumar, N.: Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition. PhD thesis, Johns Hopkins University, Baltimore (1997)
Google Scholar
Morgan, N., Chen, B.Y., Zhu, Q., Stolcke, A.: TRAPping conversational speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In: Proc. ICASSP, Montreal, vol. 1, pp. 536–539 (2004)
Google Scholar
Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N.: Using MLP features in SRI’s conversational speech recognition system. In: Proc. Interspeech, Lisbon, pp. 2141–2144 (2005)
Google Scholar
Jin, H., Matsoukas, S., Schwartz, R., Kubala, F.: Fast robust inverse transform SAT and multi-stage adaptation. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA, pp. 105–109. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Stolcke, A., Anguera, X., Boakye, K., Çetin, Ö., Grézl, F., Janin, A., Mandal, A., Peskin, B., Wooters, C., Zheng, J.: Further progress in meeting recognition: The ICSI-SRI Spring 2005 speech-to-text evaluation system. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 463–475. Springer, Heidelberg (2006)
Chapter Google Scholar
Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.: fMPE: Discriminatively trained features for speech recognition. In: Proc. ICASSP, Philadelphia, vol. 1, pp. 961–964 (2005)
Google Scholar
Zheng, J., Stolcke, A.: Improved discriminative training using phone lattices. In: Proc. Interspeech, Lisbon, pp. 2125–2128 (2005)
Google Scholar
Zheng, J., Stolcke, A.: fMPE-MAP: Improved discriminative adaptation for modeling new domains. In: Proc. Interspeech, Antwerp, pp. 1573–1576 (2007)
Google Scholar
Çetin, Ö., Stolcke, A.: Language modeling in the ICSI-SRI Spring 2005 meeting speech recognition evaluation system. Technical Report TR-05-06, International Computer Science Institute, Berkeley, CA (2005)
Google Scholar
Wooters, C., Huijbregts, M.: The ICSI RT 2007 speaker diarization system. LNCS, vol. 4625, pp. 509–519. Springer, Heidelberg (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

SRI International, Menlo Park, CA, U.S.A.
Andreas Stolcke & Jing Zheng
International Computer Science Institute, Berkeley, CA, U.S.A.
Andreas Stolcke, Xavier Anguera, Kofi Boakye, Adam Janin, Mathew Magimai-Doss & Chuck Wooters
Yahoo, Inc.,
Özgür Çetin

Authors

Andreas Stolcke
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Anguera
View author publications
You can also search for this author in PubMed Google Scholar
Kofi Boakye
View author publications
You can also search for this author in PubMed Google Scholar
Özgür Çetin
View author publications
You can also search for this author in PubMed Google Scholar
Adam Janin
View author publications
You can also search for this author in PubMed Google Scholar
Mathew Magimai-Doss
View author publications
You can also search for this author in PubMed Google Scholar
Chuck Wooters
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rainer Stiefelhagen Rachel Bowers Jonathan Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stolcke, A. et al. (2008). The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_42

Download citation

DOI: https://doi.org/10.1007/978-3-540-68585-2_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics