Abstract
Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora 2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Acero, A. (1993). Acoustical and environmental robustness in automatic speech recognition. Dordrecht: Kluwer Academic.
Adams, R. P., & Mackay, D. J. C. (2007). Bayesian online changepoint detection. University of Cambridge Technical Report. arXiv:0710.3742v1 [stat.ML].
Afify, M., Gong, Y., & Haton, J.-P. (1998). A general joint additive and convolutive bias compensation approach applied to noisy lombard speech recognition. IEEE Transactions on Speech and Audio Processing, 6(6), 524–538.
Akbacak, M., & Hansen, J. H. L. (2007). Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 465–477.
Barreaud, V., Illina, I., & Fohr, D. (2008). On-line stochastic matching compensation for non-stationary noise. Computer Speech & Language, 22(3), 207–229.
Berouti, M., Schwartz, M., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proc. IEEE int. conf. acoustics, speech, signal proc (pp. 208–211).
Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2009). A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition. In Proc. of IEEE Canadian conference on electrical and computer engineering (CCECE) (pp. 664–669).
Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011a). Real-time Bayesian inference: a soft computing approach to environmental learning for on-line robust automatic speech recognition. In Advances in intelligent and soft computing: Vol. 87/2011. Proc. of 6th international conference on soft computing models in industrial and environmental applications SOCO 2011 (pp. 445–452).
Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011b). A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection. In Proc. of INTERSPEECH 2011, Florence, Italy, 28–31 August.
Cohen, I. (2003). Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.
Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.
Cohen, I., Benesty, J., & Gannot, S. (Eds.) (2010). Springer topics in signal processing: Vol. 3. Speech processing in modern communication: challenges and perspectives (1st edn.). Berlin: Springer.
ETSI (2000). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm, ETSI ES 201 108, v1.1.1 (2000-02).
Fan, N., Rosca, J., & Balan, R. (2007). Speech noise estimation using enhanced minima controlled recursive averaging. In Proc. IEEE int. conf. acoustics, speech, signal proc. (Vol. 4, pp. 581–584).
Gales, M. J. L. (1995). Model-based techniques for noise robust speech recognition. Ph.D. Thesis, University of Cambridge, UK.
Hirsch, H., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 153–156).
Hirsch, H.-G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of ISCA ITRW ASR2000 automatic speech recognition: challenges for the next millennium (pp. 181–188).
Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm, and system development. New York: Prentice Hall.
ITU-T Recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels, ITU-T, November 1996.
Lawrence, C., & Rahim, M. (1999). Integrated bias removal techniques for robust speech recognition. Computer Speech & Language, 13, 283–298.
Leonard, R. G. (1984). A database for speaker-independent digit recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 328–331).
Li, J., Deng, L., Yu, D., Gong, Y., & Acero, A. (2009). A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Computer Speech & Language, 23, 389–405.
Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.
Nair, N. U., & Sreenivas, T. V. (2010). Joint evaluation of multiple speech patterns for speech recognition and training. Computer Speech & Language, 24, 307–340.
O’Shaughnessy, D. (1999). Speech communications: human and machine (2nd edn.). New York: Wiley-IEEE Press.
Menéndez-Pidal, X., Chen, R., Wu, D., & Tanaka, M. (2001). Compensation of channel and noise distortions combining normalization and speech enhancement techniques. Speech Communication, 34, 115–126.
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. New York: Prentice Hall.
Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. Ph.D Thesis, University of Texas at Dallas, USA.
Rangachari, S., & Loizou, P. C. (2006). A noise estimation algorithm for highly nonstationary environments. Speech Communication, 48, 220–231.
Turner, R. (2010). Bayesian change point detection for satellite fault prediction. In Proceedings of interdisciplinary graduate conference (IGC), Cambridge, UK (pp. 213–221).
Tian, B., Sun, M., Sclabassi, R. J., & Yi, K. (2003). A unified compensation approach for speech recognition in severely adverse environment. In Fourth international symposium on uncertainty modeling and analysis (ISUMA 2003) (pp. 256–261).
Young, S. (2007). ATK real-time API for HTK, ver. 1.6. Cambridge: Cambridge University Engineering Department.
Young, S. (2009). HTK BOOK ver 3.4. Cambridge: Machine Intelligence Laboratory, University of Cambridge.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chowdhury, M.F.R., Selouani, SA. & O’Shaughnessy, D. Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR. Int J Speech Technol 15, 5–23 (2012). https://doi.org/10.1007/s10772-011-9116-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-011-9116-2
Keywords
- On-line environment learning
- Bayesian on-line inference for spectral change point detection
- MCRA
- On-line ASR
- JAC compensation
- Non-stationary noise tracking and estimate
- Minimum search window
- Frame dynamic
- DSR
- Highly non-stationary unknown test conditions
- Real-world application
- Smart phones and mobile hand-held devices
- BOSCPD