Two-stage model-based feature compensation for robust speech recognition

Shen, Haifeng; Liu, Gang; Guo, Jun

doi:10.1007/s00607-011-0152-1

Two-stage model-based feature compensation for robust speech recognition

Published: 25 August 2011

Volume 94, pages 1–20, (2012)
Cite this article

Computing Aims and scope Submit manuscript

Haifeng Shen¹,
Gang Liu¹ &
Jun Guo¹

126 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents a combination approach to robust speech recognition by using two-stage model-based feature compensation. Gaussian mixture model (GMM)-based and hidden Markov model (HMM)-based compensation approaches are combined together and conducted sequentially in the multiple-decoding recognition system. The clean speech is firstly modeled as a GMM in the initial pass, and then modeled as a HMM generated from the initial pass in the following passes, respectively. The environment parameter estimation on these two modeling strategies are formulated both under maximum a posteriori (MAP) criterion. Experimental result shows that a significant improvement is achieved compared to European Telecommunications Standards Institute (ETSI) advanced compensation approach, GMM-based feature compensation approach, HMM-based feature compensation approach, and acoustic model compensation approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature compensation based on independent noise estimation for robust speech recognition

Article Open access 16 June 2021

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

Article 06 January 2017

References

Hermansky H (1990) Perceptual linear predictive (PLP) analysis for speech. J Acoust Soc Am 87: 1738–1752
Article Google Scholar
Hermansky H, Morgan N, Bayya A, Kohn P (1991) Rasta-PLP speech analysis
Hunt MJ (1979) A statistical approach to metrics for word and syllable recognition. J Acoust Soc Am 66: S535–S536
Article Google Scholar
Stern RM, Raj B, Moreno PJ (1997) Compensation for environmental degradation in automatic speech recognition. In: Proceedings of ESCA-NATO Tutorial Research Workshop Robust Speech Recognition for Unknown Communication Channels, pp 33–42
Moreno PJ, Raj B, Stern RM (1998) Data-driven environmental compensation for speech recognition: a unified approach. Speech Commun 24: 267–285
Article Google Scholar
Moreno PJ, Raj B, Stern RM (1996) A vector Taylor series approach for environment-independent speech recognition. In: Proceedings of ICASSP-96, pp 733–736
Moreno PJ (1996) Speech recognition in noisy environments. Ph.D. Dissertation, ECE Department, CMU
Raj B, Gouvea EB, Moreno PJ, Stern RM (1996) Cepstral compensation by polynomial approximation for environment-independent speech recognition. In: Proceedings of the International Conference on Spoken Language Processing, Philadelphia, pp 2340–2343
Kim NS (1998) Statistical linear approximation for environment compensation. IEEE Signal Process Lett 5(1): 8–10
Article Google Scholar
Han ZB, Zhang SW, Zhang HY, Xu B (2003) A vector statistical piecewise polynomial approximation algorithm for environment compensation in telephone LVCSR. In: Proceedings of ICASSP-2003, pp 117–120
Kim NS (1998) Nonstationary environment compensation based on sequential estimation. IEEE Signal Process Lett 5(3): 8–10
Google Scholar
Deng L, Droppo J, Acero A (2001) Recursive noise estimation using iterative stochastic approximation for stereo-based robust speech recognition. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp 81–84
Shen HF, Liu G, Guo J, Li QX (2005) Two-domain feature compensation for robust speech recognition. In: Proceedings of the ISNN-2005, Lecture Notes in Computer Science, vol 3497, Advance in Neural Network-ISNN 2005, Springer, pp 351–356
Shen HF, Liu G, Guo J, Huang PM, Li QX (2005) Environment compensation based on maximum a posteriori estimation for improved speech recognition. In: Proceedings of the MICAI-2005, Lecture Notes in Artificial Intelligence, vol 3789, MICAI 2005. Advances in Artificial Intelligence, Springer, pp 854–862 (2005)
Couvreur C, Hamme HV (2000) Model-based feature enhancement for noisy speech recognition. In: Proceedings of the ICASSP-2000, vol 3, pp 1719–1722
Stouten V, Hamme HV, Demuynck K, Wambacq P (2003) Robust speech recognition using model-based feature enhancement. In: Proceedings of the Eurospeech 2003, pp 17–20
Normandin Y, Cardin R, Mori RD (1994) High-performance connected digit recognition using maximum mutual information estimation. IEEE Trans. Speech Audio Process 2(2):299–311
Google Scholar
Nadas A, Nahamoo D, Picheny MA (1998) On a model-robust training method for speech recognition. IEEE Trans Acoust Speech Signal Process 36(9): 1432–1436
Article Google Scholar
Ephraim Y, Dembo A, Rabiner L (1989) A Minimum discrimination information approach for hidden markov modeling. IEEE Trans Inf Theory 35(5): 1001–1013
Article MATH MathSciNet Google Scholar
Gauvain JL, Lee CH (1994) Maximum a posteriori estimation for multivariate Gaussian mixture observation of Markov chains. IEEE Trans Speech Audio Process 2(2): 291–298
Article Google Scholar
Huo Q, Lee CH (1997) On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate. IEEE Trans Speech Audio Process 5(2): 161–172
Article Google Scholar
Huo Q, Chan C, Lee CH (1995) Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition. IEEE Trans Speech Audio Process 3(5):334–3 (1995)
Google Scholar
Huo Q, Chan C, Lee CH (1994) Bayesian learning of the SCHMM parameters for speech recognition. In: Proceedings of the ICASSP-94, vol 1, pp 221–224
Huo Q, Lee CH (1995) A study of on-line quasi-Bayes adaptation for CDHMM-based speech recognition. In: Proceedings of the ICASSP-96, vol 2, pp 705–708
Acero A, Deng L, Kristjansson K, Zhang J (2000) HMM adaptation using vector Taylor series for noisy speech recognition. In: Proceedings of the ICSLP 2000
SagaYama S, Yamaguchi Y, Takahashi S, Takahashi J (1997) Jacobian approach to fast acoustic model adaptation. In: Proceedings of the ICASSP-97, Munich, pp 835–838
Cerisara C, Rigazio L, JunQua JC (2004) α-Jacobian environmental adaptation. Speech Commun 42: 25–41
Article Google Scholar
Gales MJF (1995) Model-based techniques for noise robust speech recognition. Ph.D. thesis, University of Cambridge
Gales MJF, Young SJ (1993) Cepstral parameter compensation for HMM recognition in noise. Speech Commun 12: 231–239
Article Google Scholar
Gales MJF, Young SJ (1995) A fast and flexible implementation of parallel model combination. In: Proceedings of the ICASSP-95, Detroit, pp 133–136
Komori Y, Kosaka T, Yamamoto, H, Yamada M (1997) Fast parallel model combination noise adaptation processing. In: Proceedings of the Eurospeech-97, Rhodes, Greece, pp 1523–1526
Gales MJF (1998) Predictive model-based compensation schemes for robust speech recognition. Speech Commun 25: 49–74
Article Google Scholar
Hwang TH, Wang HC (2000) A fast algorithm for parallel model combination for noisy speech recognition. Comput Speech Lang 14: 81–100
Article Google Scholar
Shen HF, Li QX, Guo J, Liu G (2005) HMM parameter adaptation using the truncated first-order VTS and EM algorithm for robust speech recognition. In: Proceedings of the CIS-2005, Lecture Notes in Artificial Intelligence, vol 3801, Springer, pp 979–984
Kim NS, Lim W, Stern RM (2005) Feature compensation based on switching linear dynamic model. IEEE Signal Process Lett 12(6): 473–476
Article Google Scholar
Droppo J, Acero A (2004) Noise robust speech recognition with a switching linear dynamic model. In: Proceedings of the ICASSP-2004, Montreal, QC, Canada, pp 953–956
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1): 1–38
MATH MathSciNet Google Scholar
Zu YQ Issues in the scientific design of the continuous speech database. Available at: http://www.cass.net.cn/chinese/s18_yys/yuyin/report/report_1998.htm
Varga A, Steenneken HJM, Tomilson M, Jones D (1992) The NOISEX–92 study on the effect of additive noise on automatic speech recognition. Documentation on the NOISEX-92 CD-ROMs
ETSI standard document. Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithm. ETSI document ES 202 050 v1.1.3 (2003-11), Nov. 2003

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, 100876, China
Haifeng Shen, Gang Liu & Jun Guo

Authors

Haifeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haifeng Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, H., Liu, G. & Guo, J. Two-stage model-based feature compensation for robust speech recognition. Computing 94, 1–20 (2012). https://doi.org/10.1007/s00607-011-0152-1

Download citation

Received: 03 May 2009
Accepted: 08 July 2011
Published: 25 August 2011
Issue Date: January 2012
DOI: https://doi.org/10.1007/s00607-011-0152-1

Keywords

Mathematics Subject Classification (2000)

68T10

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stage model-based feature compensation for robust speech recognition

Abstract

Access this article

Similar content being viewed by others

Feature compensation based on independent noise estimation for robust speech recognition

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Two-stage model-based feature compensation for robust speech recognition

Abstract

Access this article

Similar content being viewed by others

Feature compensation based on independent noise estimation for robust speech recognition

Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments

Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation