A Clustering Algorithm for the Fast Match of Acoustic Conditions in Continuous Speech Recognition

Rodríguez, Luis Javier; Torres, M. Inés

doi:10.1007/11492542_69

Luis Javier Rodríguez¹⁹ &
M. Inés Torres¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3523))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1587 Accesses

Abstract

In practical speech recognition applications, channel/environment conditions may not match those of the corpus used to estimate the acoustic models. A straightforward methodology is proposed in this paper by which the speech recognizer can match the acoustic conditions of input utterances, thus allowing instantaneous adaptation schemes. First a number of clusters is determined in the training material in a fully unsupervised way, using a dissimilarity measure based on shallow acoustic models. Then accurate acoustic models are estimated for each cluster, and finally a fast match strategy, based on the shallow models, is used to choose the most likely acoustic condition for each input utterance. The performance of the clustering algorithm was tested on two speech databases in Spanish: SENGLAR (read speech) and CORLEC-EHU-1 (spontaneous human-human dialogues). In both cases, speech utterances were consistently grouped by gender, by recording conditions or by background/channel noise. Furthermore, the fast match methodology led to noticeable improvements in preliminary phonetic recognition experiments, at 20-50% of the computational cost of the ML match.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gales, M.J.F.: Adaptive Training for Robust ASR. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Madonna di Campiglio, Italy (2001)
Google Scholar
Gao, Y., Padmanabhan, M., Picheny, M.: Speaker Adaptation Based on Pre- Clustering Training Speakers. In: Proceedings of the European Conference on Speech Communications and Technology (EUROSPEECH), pp. 2091–2094 (1997)
Google Scholar
Jin, H., Kubala, F., Schwartz, R.: Automatic Speaker Clustering. In: Proceedings of the DARPA Speech Recognition Workshop, Chantilly, VA, pp. 108–111 (1997)
Google Scholar
Chen, S.S., Gopalakrishnan, P.S.: Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA (1998)
Google Scholar
Ajmera, J., Wooters, C.: A Robust Speaker Clustering Algorithm. In: Proceedings of the IEEEWorkshop on Automatic Speech Recognition and Understanding (ASRU), St. Thomas, U.S. Virgin Islands (2003)
Google Scholar
Rodríguez, L.J., Torres, M.I.: A Speaker Clustering Algorithm for Fast Speaker Adaptation in Continuous Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 433–440. Springer, Heidelberg (2004)
Chapter Google Scholar
Rodríguez, L.J., Torres, M.I.: Annotation and Analysis of Acoustic and Lexical Events in a Generic Corpus of Spontaneous Speech in Spanish. In: Proceedings of the ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo Institute of Technology, Tokyo, Japan, pp. 187–190 (2003)
Google Scholar
Linde, Y., Buzo, A., Gray, R.M.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28, 84–95 (1980)
Article Google Scholar
Gauvain, J.L., Lee, C.H.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Recognition & Speech Technology Group, DEE. Facultad de Ciencia y Tecnología, Universidad del País Vasco, Apartado 644, 48080, Bilbao, Spain
Luis Javier Rodríguez & M. Inés Torres

Authors

Luis Javier Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
M. Inés Torres
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Superior Técnico & Instituto de Sistemas e Robótica,, 1049-001, Lisboa, Portugal
Jorge S. Marques
ETSI Informática y e Telecomunicación, University of Granada, 18071, Granada, Spain
Nicolás Pérez de la Blanca
Instituto Superior Técnico, CERENA-Centro de Recursos Naturais e Ambiente, Av. Rovisco Pais, 1049-001, Lisboa, Portugal
Pedro Pina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez, L.J., Torres, M.I. (2005). A Clustering Algorithm for the Fast Match of Acoustic Conditions in Continuous Speech Recognition. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds) Pattern Recognition and Image Analysis. IbPRIA 2005. Lecture Notes in Computer Science, vol 3523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11492542_69

Download citation

DOI: https://doi.org/10.1007/11492542_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26154-4
Online ISBN: 978-3-540-32238-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics