Skip to main content
Log in

Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora 2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Acero, A. (1993). Acoustical and environmental robustness in automatic speech recognition. Dordrecht: Kluwer Academic.

    Book  Google Scholar 

  • Adams, R. P., & Mackay, D. J. C. (2007). Bayesian online changepoint detection. University of Cambridge Technical Report. arXiv:0710.3742v1 [stat.ML].

  • Afify, M., Gong, Y., & Haton, J.-P. (1998). A general joint additive and convolutive bias compensation approach applied to noisy lombard speech recognition. IEEE Transactions on Speech and Audio Processing, 6(6), 524–538.

    Article  Google Scholar 

  • Akbacak, M., & Hansen, J. H. L. (2007). Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 465–477.

    Article  Google Scholar 

  • Barreaud, V., Illina, I., & Fohr, D. (2008). On-line stochastic matching compensation for non-stationary noise. Computer Speech & Language, 22(3), 207–229.

    Article  Google Scholar 

  • Berouti, M., Schwartz, M., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proc. IEEE int. conf. acoustics, speech, signal proc (pp. 208–211).

    Google Scholar 

  • Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2009). A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition. In Proc. of IEEE Canadian conference on electrical and computer engineering (CCECE) (pp. 664–669).

    Google Scholar 

  • Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011a). Real-time Bayesian inference: a soft computing approach to environmental learning for on-line robust automatic speech recognition. In Advances in intelligent and soft computing: Vol. 87/2011. Proc. of 6th international conference on soft computing models in industrial and environmental applications SOCO 2011 (pp. 445–452).

    Chapter  Google Scholar 

  • Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011b). A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection. In Proc. of INTERSPEECH 2011, Florence, Italy, 28–31 August.

    Google Scholar 

  • Cohen, I. (2003). Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.

    Article  Google Scholar 

  • Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.

    Article  Google Scholar 

  • Cohen, I., Benesty, J., & Gannot, S. (Eds.) (2010). Springer topics in signal processing: Vol. 3. Speech processing in modern communication: challenges and perspectives (1st edn.). Berlin: Springer.

    MATH  Google Scholar 

  • ETSI (2000). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm, ETSI ES 201 108, v1.1.1 (2000-02).

  • Fan, N., Rosca, J., & Balan, R. (2007). Speech noise estimation using enhanced minima controlled recursive averaging. In Proc. IEEE int. conf. acoustics, speech, signal proc. (Vol. 4, pp. 581–584).

    Google Scholar 

  • Gales, M. J. L. (1995). Model-based techniques for noise robust speech recognition. Ph.D. Thesis, University of Cambridge, UK.

  • Hirsch, H., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 153–156).

    Google Scholar 

  • Hirsch, H.-G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of ISCA ITRW ASR2000 automatic speech recognition: challenges for the next millennium (pp. 181–188).

    Google Scholar 

  • Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm, and system development. New York: Prentice Hall.

    Google Scholar 

  • ITU-T Recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels, ITU-T, November 1996.

  • Lawrence, C., & Rahim, M. (1999). Integrated bias removal techniques for robust speech recognition. Computer Speech & Language, 13, 283–298.

    Article  Google Scholar 

  • Leonard, R. G. (1984). A database for speaker-independent digit recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 328–331).

    Google Scholar 

  • Li, J., Deng, L., Yu, D., Gong, Y., & Acero, A. (2009). A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Computer Speech & Language, 23, 389–405.

    Article  Google Scholar 

  • Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.

    Google Scholar 

  • Nair, N. U., & Sreenivas, T. V. (2010). Joint evaluation of multiple speech patterns for speech recognition and training. Computer Speech & Language, 24, 307–340.

    Article  Google Scholar 

  • O’Shaughnessy, D. (1999). Speech communications: human and machine (2nd edn.). New York: Wiley-IEEE Press.

    Google Scholar 

  • Menéndez-Pidal, X., Chen, R., Wu, D., & Tanaka, M. (2001). Compensation of channel and noise distortions combining normalization and speech enhancement techniques. Speech Communication, 34, 115–126.

    Article  MATH  Google Scholar 

  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. New York: Prentice Hall.

    Google Scholar 

  • Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. Ph.D Thesis, University of Texas at Dallas, USA.

  • Rangachari, S., & Loizou, P. C. (2006). A noise estimation algorithm for highly nonstationary environments. Speech Communication, 48, 220–231.

    Article  Google Scholar 

  • Turner, R. (2010). Bayesian change point detection for satellite fault prediction. In Proceedings of interdisciplinary graduate conference (IGC), Cambridge, UK (pp. 213–221).

    Google Scholar 

  • Tian, B., Sun, M., Sclabassi, R. J., & Yi, K. (2003). A unified compensation approach for speech recognition in severely adverse environment. In Fourth international symposium on uncertainty modeling and analysis (ISUMA 2003) (pp. 256–261).

    Chapter  Google Scholar 

  • Young, S. (2007). ATK real-time API for HTK, ver. 1.6. Cambridge: Cambridge University Engineering Department.

    Google Scholar 

  • Young, S. (2009). HTK BOOK ver 3.4. Cambridge: Machine Intelligence Laboratory, University of Cambridge.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. F. R. Chowdhury.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chowdhury, M.F.R., Selouani, SA. & O’Shaughnessy, D. Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR. Int J Speech Technol 15, 5–23 (2012). https://doi.org/10.1007/s10772-011-9116-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9116-2

Keywords

Navigation