Skip to main content
Log in

A noise-robust front-end for distributed speech recognition in mobile communications

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper investigates a new front-end processing that aims at improving the performance of speech recognition in noisy mobile environments. This approach combines features based on conventional Mel-cepstral Coefficients (MFCCs), Line Spectral Frequencies (LSFs) and formant-like (FL) features to constitute robust multivariate feature vectors. The resulting front-end constitutes an alternative to the DSR-XAFE (XAFE: eXtended Audio Front-End) available in GSM mobile communications. Our results showed that for highly noisy speech, using the paradigm that combines these spectral cues leads to a significant improvement in recognition accuracy on the Aurora 2 task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ETSI (2003). Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithm (Technical Report). ETSI ES 201-108.

  • Garner, P., & Holmes, W. (1998). On the robust incorporation of formant features into Hidden Markov Models for automatic speech recognition. In Proceedings of IEEE ICASSP (pp. 1–4).

  • Itakura, F. (1975). Line spectrum representation of linear predictive coefficients of speech signals. Journal of the Acoustical Society of America, 57(1), s35.

    Article  Google Scholar 

  • ITU recommendation G. 712 (1996). Transmission performance characteristics of pulse code modulation channels.

  • ITU-T Recommendation G. 723.1 (1996). Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s.

  • Junqua, J.-C., & Haton, J.-P. (1996). Robustness in automaticrecognition. Dordrecht: Kluwer Academic.

    Google Scholar 

  • O’Shaughnessy, D. (2001). Speech communication: human and machine. New York: IEEE Press.

    Google Scholar 

  • Rose, R., & Momayyez, P. (2007). Integration of multiple feature sets for reducing ambiguity in automatic speech recognition. Proc. IEEE-ICASSP (pp. 325–328).

  • Selouani, S.-A., Tolba, H., & Shaughnessy, D. O. (2003). Auditory-based acoustic distinctive features and spectral cues for robust automatic speech recognition in low-SNR car environments. In Proceedings of human language technology conference of the North American Association for Computational Linguistics, CP volume, 91–94, Edmonton.

  • Selouani, S.-A., Hamam, H., & O’Shaughnessy, D. (2007). A hybrid Genetic-Neural Front-end extension for robust speech recognition over telephone lines. In Lecture notes on computer science (pp. 169–178). Berlin: Springer.

    Google Scholar 

  • Soong, F., & Juang, B. (1984). Line Spectrum Pairs (LSP) and speech data compression. In Proceedings of International. Conference on Acoustics, Speech, and Signal Processing, San Diego (pp. 1-10-1/1–10-4).

  • Tolba, H., Selouani, S.-A., & O’Shaughnessy, D. (2002). Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm. In Proc. of the ICASSP (pp. 837–840). Orlando, USA.

  • Young, S. J. (2006). HTK version 3.4: reference manual and user manual. Cambridge: Cambridge University, Engineering Department Speech Group.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sid-Ahmed Selouani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Addou, D., Selouani, SA., Kifaya, K. et al. A noise-robust front-end for distributed speech recognition in mobile communications. Int J Speech Technol 10, 167–173 (2007). https://doi.org/10.1007/s10772-009-9025-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-009-9025-9

Keywords

Navigation