Skip to main content
Log in

Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.

In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrassy, B., Vlaj, D., and Beaugeant, C. (2001). Recognition performance of the siemens front-end with and without frame dropping on the Aurora 2 database. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 193-196.

  • Benitez, C., Burget, L., Chen, B., Dupont, S., Garudadri, H., Hermansky, H., Jain, P., Kajarekar, S., Morgan, N., and Sivadas, S. (2001). Robust ASR front-end using spectral-based and discriminant features: Experiments on the Aurora tasks. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 429-432.

  • Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2):113-120.

    Google Scholar 

  • COST 249 SpeechDat SIG (2000). The RefRec Homepage. http://www.telenor.no/fou/prosjekter/taletek/refrec/

  • Deller, J.R., Proakis, J.G., and Hansen, J.H.L. (1993). Discrete-Time Processing of Speech Signals. New York, USA: Macmillan Publishing Company.

    Google Scholar 

  • ETSI standard document (2000). Speech processing, transmission and quality aspects (STQ), distributed speech recognition, front-end feature extraction algorithm, compression algorithm. ETSI ES 201 108 v1.1.1 (2000-02). Sophia Antipolis, France.

  • ETSI-SMG technical specification (1994). European digital cellular telecommunication system (Phase 1)-Transmission planning aspects for the speech service in GSM PLMN system-GSM03.50, version3.4.0. Sophia Antipolis, France.

  • Hirsch, H.G. and Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRWASR 2000 Proceedings. Paris, France.

  • ITU recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels. Geneva, Switzerland.

  • ITU recommendation G.723.1 A (1996). Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. Annex A: Silence compression scheme. Geneva, Switzerland.

  • Junqua, J.-C. and Haton, J.-P. (1996). Robustness in Automatic Speech Recognition. Kluwer Academic Publishers. Norwell, Massachusetts, USA.

    Google Scholar 

  • Kaiser, J. and Kačič, Z. (1997). SpeechDat II Slovenian Database for the Fixed Telephone Network. Maribor, Slovenia: University of Maribor.

  • Kotnik, B., Rotovnik, T., Kačič, Z., and Horvat, B. (2001a). The design of mobile multimodal communication device-personal navigator. EUROCON 2001 Proceedings, Bratislava, Slovakia, pp. 337-340.

  • Kotnik, B., Kačič, Z., and Horvat, B. (2001b). A Multiconditional Robust Front-End Feature Extraction with a Noise Reduction Procedure Based on Improved Spectral Subtraction Algorithm. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 197-200.

  • Leonard, R.G. (1991). A Speaker-Independent Connected-Digit Database. Texas Instruments Inc., Dallas, Texas, USA

  • Lindberg, B., Johansen, F.T., Warakagoda, N., Lehtinen, G., Kačič, Z., Zgank, A., Elenius, K., and Salvi, G. (2000). A noise robust multilingual reference recogniser based on SpeechDat II. ICSLP 2000 Proceedings. Beijing, China. Paper No. 01775.

  • Martin, R. (1994). Spectral subtraction based on minimum statistics. EUSIPCO1994 Proceedings. Edinburgh, Scotland, UK. pp. 1182-1185.

  • Oviatt, S. (2000). Multimodal signal processing in naturalistic noisy environments. ICSLP 2000 Proceedings. Beijing, China, pp. 696-699.

  • Pearce, D. (2000). An overview of the ETSI standards activities for distributed speech recognition front-ends. AVIOS 2000 Proceedings. San Jose, CA, USA.

  • Van den Heuvel, H., Boves, L., Moreno, A., Omologo, M., Richard, G., and Sanders, E. (2001). Annotation in the SpeechDat projects. International Journal of Speech Technology, 4(2):127-143.

    Google Scholar 

  • Varga, A.P. and Moore, R.K. (1990). Hidden Markov model decomposition of speech and noise. ICASSP 1990 Proceedings. Albuquerque, New Mexico, USA, pp. 845-848.

  • Yapanel, U., Hansen, J.H.L., Sarikaya, R., and Pellom, B. (2001). Robust digit recognition in noise: An evaluation using the AURORA Corpus. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 209-212.

  • Young, S. (1997). HTKBook-Version 2.1, Cambridge, UK: Entropic Cambridge Research Laboratory.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kotnik, B., Vlaj, D. & Horvat, B. Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems. International Journal of Speech Technology 6, 205–219 (2003). https://doi.org/10.1023/A:1023410018862

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023410018862

Navigation