Tandem Connectionist Feature Extraction for Conversational Speech Recognition

Zhu, Qifeng; Chen, Barry; Morgan, Nelson; Stolcke, Andreas

doi:10.1007/978-3-540-30568-2_19

Qifeng Zhu¹⁸,
Barry Chen^18,19,
Nelson Morgan^18,19 &
…
Andreas Stolcke^18,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

1035 Accesses
15 Citations

Abstract

Multi-Layer Perceptrons (MLPs) can be used in automatic speech recognition in many ways. A particular application of this tool over the last few years has been the Tandem approach, as described in [7] and other more recent publications. Here we discuss the characteristics of the MLP-based features used for the Tandem approach, and conclude with a report on their application to conversational speech recognition. The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM. Two or more vectors of these features can easily be combined without increasing the feature dimension. We also report recognition results that show that MLP features can significantly improve recognition performance for the NIST 2001 Hub-5 evaluation set with models trained on the Switchboard Corpus, even for complex systems incorporating MMIE training and other enhancements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andreou, A., Kamm, T., Cohen, J.: Experiments in Vocal Tract Normalization. In: Proc. CAIP Workshop: Frontiers in Speech Recognition II (1994)
Google Scholar
Benitez, C., Burget, L., Chen, B., Dupont, S., Garudadri, H., Hermanskey, H., Jain, P., Kajarekar, S., Morgan, N., Sivadas, S.: Robust ASR front-end using spectral based and discriminant features: experiments on the Aurora task. In: Eurospeech (2001)
Google Scholar
Bourlard, H., Wellekens, C.: Links between Markov models and multilayer percep-trons. IEEE Trans. Pattern Anal. Machine Intell. 12, 1167–1178 (1990)
Article Google Scholar
Chen, B., Zhu, Q., Morgan, N.: Learning long term temporal features in LVCSR using neural networks. In: ICSLP (2004) (submitted)
Google Scholar
Gales, M.J.F.: Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech and Audio Processing 7, 272–281 (1999)
Article Google Scholar
Gao, X., Zhu, W., Shi, Q.: The IBM LVCSR System Used for 1998 Mandarin Broad-cast News Transcription Evaluation. In: Proc. DARPA Broadcast News Workshop (1999)
Google Scholar
Hermansky, H., Ellis, D.P.W., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proc. ICASSP 2000, pp. 1635–1638 (2000)
Google Scholar
Hermansky, H., Sharma, S.: TRAPS - Classifiers of Temporal Patterns. In: Proc. ICSLP (1998)
Google Scholar
Misra, H., Bourlard, H., Tyagi, V.: New entropy based combination rules in HMM/ANN multi-stream ASR. In: Proc. ICASSP (2003)
Google Scholar
Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Processing Magazine 12(3), 24 (1995)
Article Google Scholar
Morgan, N., Chen, B., Zhu, Q., Stolcke, A.: TRAPping Conversational Speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In: ICASSP (2004)
Google Scholar
Reyes-Gomez, M., Ellis, D.P.W.: Error visualization for Tandem acoustic modeling on the Aurora task. In: ICASSP (2002)
Google Scholar
Robinson, A.J., Cook, G.D., Ellis, D.P.W., Fosler-Lussier, E., Renals, S.J., Williams, D.A.G.: Connectionist speech recognition of Broadcast News. Speech Communication 37(1-2), 27–45 (2002)
Article MATH Google Scholar
Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Rao Gadde, V.R., Plauche, M., Richey, C., Shriberg, E., Sonmez, K., Weng, F., Zheng, J.: The SRI March 2005 Hub-5 con-versational speech transcription system. In: Proc. NIST Transcription Workshop (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute,
Qifeng Zhu, Barry Chen, Nelson Morgan & Andreas Stolcke
University of California, Berkeley
Barry Chen & Nelson Morgan
SRI International,
Andreas Stolcke

Authors

Qifeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Barry Chen
View author publications
You can also search for this author in PubMed Google Scholar
Nelson Morgan
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Stolcke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
IDIAP Research Institute, CH-1920, Martigny, Switzerland
Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Q., Chen, B., Morgan, N., Stolcke, A. (2005). Tandem Connectionist Feature Extraction for Conversational Speech Recognition. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-30568-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics