Abstract
This paper investigates a new front-end processing that aims at improving the performance of speech recognition in noisy mobile environments. This approach combines features based on conventional Mel-cepstral Coefficients (MFCCs), Line Spectral Frequencies (LSFs) and formant-like (FL) features to constitute robust multivariate feature vectors. The resulting front-end constitutes an alternative to the DSR-XAFE (XAFE: eXtended Audio Front-End) available in GSM mobile communications. Our results showed that for highly noisy speech, using the paradigm that combines these spectral cues leads to a significant improvement in recognition accuracy on the Aurora 2 task.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
ETSI (2003). Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithm (Technical Report). ETSI ES 201-108.
Garner, P., & Holmes, W. (1998). On the robust incorporation of formant features into Hidden Markov Models for automatic speech recognition. In Proceedings of IEEE ICASSP (pp. 1–4).
Itakura, F. (1975). Line spectrum representation of linear predictive coefficients of speech signals. Journal of the Acoustical Society of America, 57(1), s35.
ITU recommendation G. 712 (1996). Transmission performance characteristics of pulse code modulation channels.
ITU-T Recommendation G. 723.1 (1996). Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s.
Junqua, J.-C., & Haton, J.-P. (1996). Robustness in automaticrecognition. Dordrecht: Kluwer Academic.
O’Shaughnessy, D. (2001). Speech communication: human and machine. New York: IEEE Press.
Rose, R., & Momayyez, P. (2007). Integration of multiple feature sets for reducing ambiguity in automatic speech recognition. Proc. IEEE-ICASSP (pp. 325–328).
Selouani, S.-A., Tolba, H., & Shaughnessy, D. O. (2003). Auditory-based acoustic distinctive features and spectral cues for robust automatic speech recognition in low-SNR car environments. In Proceedings of human language technology conference of the North American Association for Computational Linguistics, CP volume, 91–94, Edmonton.
Selouani, S.-A., Hamam, H., & O’Shaughnessy, D. (2007). A hybrid Genetic-Neural Front-end extension for robust speech recognition over telephone lines. In Lecture notes on computer science (pp. 169–178). Berlin: Springer.
Soong, F., & Juang, B. (1984). Line Spectrum Pairs (LSP) and speech data compression. In Proceedings of International. Conference on Acoustics, Speech, and Signal Processing, San Diego (pp. 1-10-1/1–10-4).
Tolba, H., Selouani, S.-A., & O’Shaughnessy, D. (2002). Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm. In Proc. of the ICASSP (pp. 837–840). Orlando, USA.
Young, S. J. (2006). HTK version 3.4: reference manual and user manual. Cambridge: Cambridge University, Engineering Department Speech Group.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Addou, D., Selouani, SA., Kifaya, K. et al. A noise-robust front-end for distributed speech recognition in mobile communications. Int J Speech Technol 10, 167–173 (2007). https://doi.org/10.1007/s10772-009-9025-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-009-9025-9