Feature selection for robust automatic speech recognition: a temporal offset approach

Trottier, Ludovic; Giguère, Philippe; Chaib-draa, Brahim

doi:10.1007/s10772-015-9276-6

Feature selection for robust automatic speech recognition: a temporal offset approach

Published: 20 March 2015

Volume 18, pages 395–404, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

208 Accesses
2 Citations
Explore all metrics

Abstract

Automatic speech recognition relies on extracting features at fixed intervals. In order to enhance these features with dynamical (delta) components, discrete derivatives are usually computed and added as features. However, derivative operations tend to be susceptible to noise. Our proposed method alleviates this problem by replacing these derivatives with nearby features selected on a per-frequency basis. In particular, we noted that, at low frequency, consecutive samples are highly correlated and more information can be gathered by looking at features farther away in time. We thus propose a strategy to perform this frequency-based selection and evaluate it on the Aurora 2 continuous-digits and connected-digits tasks using MFCC, PLPCC and LPCC standard features. The results of our experimentations show that our strategy achieved an average relative improvement of \(32.10\,\%\) in accuracy, with most gains in very noisy environments where the traditional delta features have low recognition rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal Feature Selection for Noisy Speech Recognition

Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition

Shennong: A Python toolbox for audio speech features extraction

Article 07 February 2023

Mathieu Bernard, Maxime Poli, … Emmanuel Dupoux

References

Bahl, L., De Souza, P., Gopalakrishnan, P., Nahamoo, D., & Picheny, M. (1994). Robust methods for using context-dependent features and models in a continuous speech recognizer. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-94, 1994 (Vol. 1, pp. I–533). IEEE.
Bresenham, J. E. (1965). Algorithm for computer control of a digital plotter. IBM System Journal, 4(1), 25–30.
Article Google Scholar
Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego: Academic Press Professional Inc.
MATH Google Scholar
Furui, S. (1986). Speaker-independent isolated word recognition based on emphasized spectral dynamics. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’86 (Vol. 11, pp. 1991–1994). IEEE.
Gales, M., & Young, S. (2008). The application of hidden markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195–304.
Article Google Scholar
Gales, M. J. (1998). Maximum likelihood linear transformations for hmm-based speech recognition. Computer Speech & Language, 12(2), 75–98.
Article Google Scholar
Gales, M. J. (1999). Semi-tied covariance matrices for hidden markov models. IEEE Transactions on Speech and Audio Processing, 7(3), 272–281.
Article Google Scholar
Gopinath, R. A. (1998). Maximum likelihood modeling with gaussian distributions for classification. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 1998 (Vol. 2, pp. 661–664). IEEE.
Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 4th International Conference on Signal Processing and Communication Systems (ICSPCS), 2010 (pp. 1–5). IEEE.
Hyvärinen, A., Karhunen, J., & Oja, E. (2004). Independent component analysis (Vol. 46). New York: Wiley.
Google Scholar
Jolliffe, I. (1986). Principal component analysis. Springer series in statistics. Berlin: Springer.
Book Google Scholar
Kumar, K., Kim, C., & Stern, R. M. (2011). Delta-spectral cepstral coefficients for robust speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011 (pp. 4784–4787). IEEE.
Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication, 26(4), 283–297.
Article Google Scholar
Leggetter, C. J., & Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech & Language, 9(2), 171–185.
Article Google Scholar
Lockwood, P., & Boudy, J. (1992). Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication, 11(23), 215–228.
Article Google Scholar
Oppenheim, A. V., Schafer, R. W., & Buck, J. R. (1999). Discrete-time signal processing (2nd ed.). Upper Saddle River: Prentice-Hall Inc.
Google Scholar
Pearce, D., günter Hirsch, H., & Gmbh, E. E. D. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000 (pp. 29–32).
Rath, S. P., Povey, D., & Veselỳ, K. (2013). Improved feature processing for deep neural networks. In Proceedings of Interspeech.
Saon, G., Padmanabhan, M., Gopinath, R., & Chen, S. (2000). Maximum likelihood discriminant feature spaces. In Proceedings 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2000, ICASSP’00 (Vol. 2, pp. II1129–II1132). IEEE.
Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv:1305.1145.
Trottier, L., Chaib-draa, B., & Giguère, P. (2014). Effects of frequency-based inter-frame dependencies on automatic speech recognition. In Canadian Conference on AI (pp. 357–362).
Weng, Z., Li, L., & Guo, D. (2010). Speaker recognition using weighted dynamic MFCC based on GMM. In International Conference on Anti-Counterfeiting Security and Identification in Communication (ASID), 2010 (pp. 285–288). IEEE.
Young, S. J., Evermann, G., Gales, M. J. F., Hain, T., Kershaw, D., Moore, G., et al. (2006). The HTK book, version 3.4. Cambridge: Cambridge University Engineering Department.
Google Scholar
Yu, D., Seltzer, M. L., Li, J., Huang, J.-T., & Seide, F. (2013). Feature learning in deep neural networks-studies on speech recognition tasks. arXiv:1301.3605.
Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Laval University, Québec, Canada
Ludovic Trottier, Philippe Giguère & Brahim Chaib-draa

Authors

Ludovic Trottier
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Giguère
View author publications
You can also search for this author in PubMed Google Scholar
Brahim Chaib-draa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ludovic Trottier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trottier, L., Giguère, P. & Chaib-draa, B. Feature selection for robust automatic speech recognition: a temporal offset approach. Int J Speech Technol 18, 395–404 (2015). https://doi.org/10.1007/s10772-015-9276-6

Download citation

Received: 29 September 2014
Accepted: 04 March 2015
Published: 20 March 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s10772-015-9276-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection for robust automatic speech recognition: a temporal offset approach

Abstract

Access this article

Similar content being viewed by others

Temporal Feature Selection for Noisy Speech Recognition

Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition

Shennong: A Python toolbox for audio speech features extraction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection for robust automatic speech recognition: a temporal offset approach

Abstract

Access this article

Similar content being viewed by others

Temporal Feature Selection for Noisy Speech Recognition

Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition

Shennong: A Python toolbox for audio speech features extraction

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation