Skip to main content
Log in

Feature selection for robust automatic speech recognition: a temporal offset approach

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Automatic speech recognition relies on extracting features at fixed intervals. In order to enhance these features with dynamical (delta) components, discrete derivatives are usually computed and added as features. However, derivative operations tend to be susceptible to noise. Our proposed method alleviates this problem by replacing these derivatives with nearby features selected on a per-frequency basis. In particular, we noted that, at low frequency, consecutive samples are highly correlated and more information can be gathered by looking at features farther away in time. We thus propose a strategy to perform this frequency-based selection and evaluate it on the Aurora 2 continuous-digits and connected-digits tasks using MFCC, PLPCC and LPCC standard features. The results of our experimentations show that our strategy achieved an average relative improvement of \(32.10\,\%\) in accuracy, with most gains in very noisy environments where the traditional delta features have low recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bahl, L., De Souza, P., Gopalakrishnan, P., Nahamoo, D., & Picheny, M. (1994). Robust methods for using context-dependent features and models in a continuous speech recognizer. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-94, 1994 (Vol. 1, pp. I–533). IEEE.

  • Bresenham, J. E. (1965). Algorithm for computer control of a digital plotter. IBM System Journal, 4(1), 25–30.

    Article  Google Scholar 

  • Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego: Academic Press Professional Inc.

    MATH  Google Scholar 

  • Furui, S. (1986). Speaker-independent isolated word recognition based on emphasized spectral dynamics. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86 (Vol. 11, pp. 1991–1994). IEEE.

  • Gales, M., & Young, S. (2008). The application of hidden markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195–304.

    Article  Google Scholar 

  • Gales, M. J. (1998). Maximum likelihood linear transformations for hmm-based speech recognition. Computer Speech & Language, 12(2), 75–98.

    Article  Google Scholar 

  • Gales, M. J. (1999). Semi-tied covariance matrices for hidden markov models. IEEE Transactions on Speech and Audio Processing, 7(3), 272–281.

    Article  Google Scholar 

  • Gopinath, R. A. (1998). Maximum likelihood modeling with gaussian distributions for classification. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 1998 (Vol. 2, pp. 661–664). IEEE.

  • Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 4th International Conference on Signal Processing and Communication Systems (ICSPCS), 2010 (pp. 1–5). IEEE.

  • Hyvärinen, A., Karhunen, J., & Oja, E. (2004). Independent component analysis (Vol. 46). New York: Wiley.

    Google Scholar 

  • Jolliffe, I. (1986). Principal component analysis. Springer series in statistics. Berlin: Springer.

    Book  Google Scholar 

  • Kumar, K., Kim, C., & Stern, R. M. (2011). Delta-spectral cepstral coefficients for robust speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011 (pp. 4784–4787). IEEE.

  • Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication, 26(4), 283–297.

    Article  Google Scholar 

  • Leggetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech & Language, 9(2), 171–185.

    Article  Google Scholar 

  • Lockwood, P., & Boudy, J. (1992). Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication, 11(23), 215–228.

    Article  Google Scholar 

  • Oppenheim, A. V., Schafer, R. W., & Buck, J. R. (1999). Discrete-time signal processing (2nd ed.). Upper Saddle River: Prentice-Hall Inc.

    Google Scholar 

  • Pearce, D., günter Hirsch, H., & Gmbh, E. E. D. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000 (pp. 29–32).

  • Rath, S. P., Povey, D., & Veselỳ, K. (2013). Improved feature processing for deep neural networks. In Proceedings of Interspeech.

  • Saon, G., Padmanabhan, M., Gopinath, R., & Chen, S. (2000). Maximum likelihood discriminant feature spaces. In Proceedings 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000, ICASSP’00 (Vol. 2, pp. II1129–II1132). IEEE.

  • Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv:1305.1145.

  • Trottier, L., Chaib-draa, B., & Giguère, P. (2014). Effects of frequency-based inter-frame dependencies on automatic speech recognition. In Canadian Conference on AI (pp. 357–362).

  • Weng, Z., Li, L., & Guo, D. (2010). Speaker recognition using weighted dynamic MFCC based on GMM. In International Conference on Anti-Counterfeiting Security and Identification in Communication (ASID), 2010 (pp. 285–288). IEEE.

  • Young, S. J., Evermann, G., Gales, M. J. F., Hain, T., Kershaw, D., Moore, G., et al. (2006). The HTK book, version 3.4. Cambridge: Cambridge University Engineering Department.

    Google Scholar 

  • Yu, D., Seltzer, M. L., Li, J., Huang, J.-T., & Seide, F. (2013). Feature learning in deep neural networks-studies on speech recognition tasks. arXiv:1301.3605.

  • Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ludovic Trottier.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Trottier, L., Giguère, P. & Chaib-draa, B. Feature selection for robust automatic speech recognition: a temporal offset approach. Int J Speech Technol 18, 395–404 (2015). https://doi.org/10.1007/s10772-015-9276-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9276-6

Keywords

Navigation