Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

Chowdhury, M. F. R.; Selouani, S.-A.; O’Shaughnessy, D.

doi:10.1007/s10772-011-9116-2

Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

Published: 11 October 2011

Volume 15, pages 5–23, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

M. F. R. Chowdhury¹,
S.-A. Selouani² &
D. O’Shaughnessy¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora 2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech denoising using Bayesian NMF with online base update

Article 07 December 2018

A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

Article Open access 14 August 2021

A priori SNR estimation and noise estimation for speech enhancement

Article Open access 22 September 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Acero, A. (1993). Acoustical and environmental robustness in automatic speech recognition. Dordrecht: Kluwer Academic.
Book Google Scholar
Adams, R. P., & Mackay, D. J. C. (2007). Bayesian online changepoint detection. University of Cambridge Technical Report. arXiv:0710.3742v1 [stat.ML].
Afify, M., Gong, Y., & Haton, J.-P. (1998). A general joint additive and convolutive bias compensation approach applied to noisy lombard speech recognition. IEEE Transactions on Speech and Audio Processing, 6(6), 524–538.
Article Google Scholar
Akbacak, M., & Hansen, J. H. L. (2007). Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 465–477.
Article Google Scholar
Barreaud, V., Illina, I., & Fohr, D. (2008). On-line stochastic matching compensation for non-stationary noise. Computer Speech & Language, 22(3), 207–229.
Article Google Scholar
Berouti, M., Schwartz, M., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proc. IEEE int. conf. acoustics, speech, signal proc (pp. 208–211).
Google Scholar
Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2009). A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition. In Proc. of IEEE Canadian conference on electrical and computer engineering (CCECE) (pp. 664–669).
Google Scholar
Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011a). Real-time Bayesian inference: a soft computing approach to environmental learning for on-line robust automatic speech recognition. In Advances in intelligent and soft computing: Vol. 87/2011. Proc. of 6th international conference on soft computing models in industrial and environmental applications SOCO 2011 (pp. 445–452).
Chapter Google Scholar
Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011b). A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection. In Proc. of INTERSPEECH 2011, Florence, Italy, 28–31 August.
Google Scholar
Cohen, I. (2003). Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.
Article Google Scholar
Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.
Article Google Scholar
Cohen, I., Benesty, J., & Gannot, S. (Eds.) (2010). Springer topics in signal processing: Vol. 3. Speech processing in modern communication: challenges and perspectives (1st edn.). Berlin: Springer.
MATH Google Scholar
ETSI (2000). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm, ETSI ES 201 108, v1.1.1 (2000-02).
Fan, N., Rosca, J., & Balan, R. (2007). Speech noise estimation using enhanced minima controlled recursive averaging. In Proc. IEEE int. conf. acoustics, speech, signal proc. (Vol. 4, pp. 581–584).
Google Scholar
Gales, M. J. L. (1995). Model-based techniques for noise robust speech recognition. Ph.D. Thesis, University of Cambridge, UK.
Hirsch, H., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 153–156).
Google Scholar
Hirsch, H.-G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of ISCA ITRW ASR2000 automatic speech recognition: challenges for the next millennium (pp. 181–188).
Google Scholar
Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm, and system development. New York: Prentice Hall.
Google Scholar
ITU-T Recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels, ITU-T, November 1996.
Lawrence, C., & Rahim, M. (1999). Integrated bias removal techniques for robust speech recognition. Computer Speech & Language, 13, 283–298.
Article Google Scholar
Leonard, R. G. (1984). A database for speaker-independent digit recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 328–331).
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., & Acero, A. (2009). A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Computer Speech & Language, 23, 389–405.
Article Google Scholar
Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.
Google Scholar
Nair, N. U., & Sreenivas, T. V. (2010). Joint evaluation of multiple speech patterns for speech recognition and training. Computer Speech & Language, 24, 307–340.
Article Google Scholar
O’Shaughnessy, D. (1999). Speech communications: human and machine (2nd edn.). New York: Wiley-IEEE Press.
Google Scholar
Menéndez-Pidal, X., Chen, R., Wu, D., & Tanaka, M. (2001). Compensation of channel and noise distortions combining normalization and speech enhancement techniques. Speech Communication, 34, 115–126.
Article MATH Google Scholar
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. New York: Prentice Hall.
Google Scholar
Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. Ph.D Thesis, University of Texas at Dallas, USA.
Rangachari, S., & Loizou, P. C. (2006). A noise estimation algorithm for highly nonstationary environments. Speech Communication, 48, 220–231.
Article Google Scholar
Turner, R. (2010). Bayesian change point detection for satellite fault prediction. In Proceedings of interdisciplinary graduate conference (IGC), Cambridge, UK (pp. 213–221).
Google Scholar
Tian, B., Sun, M., Sclabassi, R. J., & Yi, K. (2003). A unified compensation approach for speech recognition in severely adverse environment. In Fourth international symposium on uncertainty modeling and analysis (ISUMA 2003) (pp. 256–261).
Chapter Google Scholar
Young, S. (2007). ATK real-time API for HTK, ver. 1.6. Cambridge: Cambridge University Engineering Department.
Google Scholar
Young, S. (2009). HTK BOOK ver 3.4. Cambridge: Machine Intelligence Laboratory, University of Cambridge.
Google Scholar

Download references

Author information

Authors and Affiliations

INRS-EMT, Université du Québec, Montréal, QC, Canada
M. F. R. Chowdhury & D. O’Shaughnessy
Université de Moncton, Campus de Shippagon, Moncton, NB, Canada
S.-A. Selouani

Authors

M. F. R. Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
S.-A. Selouani
View author publications
You can also search for this author in PubMed Google Scholar
D. O’Shaughnessy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. F. R. Chowdhury.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chowdhury, M.F.R., Selouani, SA. & O’Shaughnessy, D. Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR. Int J Speech Technol 15, 5–23 (2012). https://doi.org/10.1007/s10772-011-9116-2

Download citation

Received: 17 June 2011
Accepted: 08 September 2011
Published: 11 October 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10772-011-9116-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech denoising using Bayesian NMF with online base update

A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

A priori SNR estimation and noise estimation for speech enhancement

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech denoising using Bayesian NMF with online base update

A noise PSD estimation algorithm using derivative-based high-pass filter in non-stationary noise conditions

A priori SNR estimation and noise estimation for speech enhancement

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation