Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

Kotnik, Bojan; Vlaj, Damjan; Horvat, Bogomir

doi:10.1023/A:1023410018862

Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

Published: July 2003

Volume 6, pages 205–219, (2003)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Bojan Kotnik¹,
Damjan Vlaj¹ &
Bogomir Horvat¹

129 Accesses
10 Citations
Explore all metrics

Abstract

The evolution of robust speech recognition systems that maintain a high level of recognition accuracy in difficult and dynamically-varying acoustical environments is becoming increasingly important as speech recognition technology becomes a more integral part of mobile applications. In distributed speech recognition (DSR) architecture the recogniser's front-end is located in the terminal and is connected over a data network to a remote back-end recognition server. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to the remote back-end recogniser. DSR provides particular benefits for the applications of mobile devices such as improved recognition performance compared to using the voice channel and ubiquitous access from different networks with a guaranteed level of recognition performance. A feature extraction algorithm integrated into the DSR system is required to operate in real-time as well as with the lowest possible computational costs.

In this paper, two innovative front-end processing techniques for noise robust speech recognition are presented and compared, time-domain based frame-attenuation (TD-FrAtt) and frequency-domain based frame-attenuation (FD-FrAtt). These techniques include different forms of frame-attenuation, improvement of spectral subtraction based on minimum statistics, as well as a mel-cepstrum feature extraction procedure. Tests are performed using the Slovenian SpeechDat II fixed telephone database and the Aurora 2 database together with the HTK speech recognition toolkit. The results obtained are especially encouraging for mobile DSR systems with limited sizes of available memory and processing power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Article 15 April 2024

References

Andrassy, B., Vlaj, D., and Beaugeant, C. (2001). Recognition performance of the siemens front-end with and without frame dropping on the Aurora 2 database. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 193-196.
Benitez, C., Burget, L., Chen, B., Dupont, S., Garudadri, H., Hermansky, H., Jain, P., Kajarekar, S., Morgan, N., and Sivadas, S. (2001). Robust ASR front-end using spectral-based and discriminant features: Experiments on the Aurora tasks. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 429-432.
Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2):113-120.
Google Scholar
COST 249 SpeechDat SIG (2000). The RefRec Homepage. http://www.telenor.no/fou/prosjekter/taletek/refrec/
Deller, J.R., Proakis, J.G., and Hansen, J.H.L. (1993). Discrete-Time Processing of Speech Signals. New York, USA: Macmillan Publishing Company.
Google Scholar
ETSI standard document (2000). Speech processing, transmission and quality aspects (STQ), distributed speech recognition, front-end feature extraction algorithm, compression algorithm. ETSI ES 201 108 v1.1.1 (2000-02). Sophia Antipolis, France.
ETSI-SMG technical specification (1994). European digital cellular telecommunication system (Phase 1)-Transmission planning aspects for the speech service in GSM PLMN system-GSM03.50, version3.4.0. Sophia Antipolis, France.
Hirsch, H.G. and Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRWASR 2000 Proceedings. Paris, France.
ITU recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels. Geneva, Switzerland.
ITU recommendation G.723.1 A (1996). Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. Annex A: Silence compression scheme. Geneva, Switzerland.
Junqua, J.-C. and Haton, J.-P. (1996). Robustness in Automatic Speech Recognition. Kluwer Academic Publishers. Norwell, Massachusetts, USA.
Google Scholar
Kaiser, J. and Kačič, Z. (1997). SpeechDat II Slovenian Database for the Fixed Telephone Network. Maribor, Slovenia: University of Maribor.
Kotnik, B., Rotovnik, T., Kačič, Z., and Horvat, B. (2001a). The design of mobile multimodal communication device-personal navigator. EUROCON 2001 Proceedings, Bratislava, Slovakia, pp. 337-340.
Kotnik, B., Kačič, Z., and Horvat, B. (2001b). A Multiconditional Robust Front-End Feature Extraction with a Noise Reduction Procedure Based on Improved Spectral Subtraction Algorithm. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 197-200.
Leonard, R.G. (1991). A Speaker-Independent Connected-Digit Database. Texas Instruments Inc., Dallas, Texas, USA
Lindberg, B., Johansen, F.T., Warakagoda, N., Lehtinen, G., Kačič, Z., Zgank, A., Elenius, K., and Salvi, G. (2000). A noise robust multilingual reference recogniser based on SpeechDat II. ICSLP 2000 Proceedings. Beijing, China. Paper No. 01775.
Martin, R. (1994). Spectral subtraction based on minimum statistics. EUSIPCO1994 Proceedings. Edinburgh, Scotland, UK. pp. 1182-1185.
Oviatt, S. (2000). Multimodal signal processing in naturalistic noisy environments. ICSLP 2000 Proceedings. Beijing, China, pp. 696-699.
Pearce, D. (2000). An overview of the ETSI standards activities for distributed speech recognition front-ends. AVIOS 2000 Proceedings. San Jose, CA, USA.
Van den Heuvel, H., Boves, L., Moreno, A., Omologo, M., Richard, G., and Sanders, E. (2001). Annotation in the SpeechDat projects. International Journal of Speech Technology, 4(2):127-143.
Google Scholar
Varga, A.P. and Moore, R.K. (1990). Hidden Markov model decomposition of speech and noise. ICASSP 1990 Proceedings. Albuquerque, New Mexico, USA, pp. 845-848.
Yapanel, U., Hansen, J.H.L., Sarikaya, R., and Pellom, B. (2001). Robust digit recognition in noise: An evaluation using the AURORA Corpus. EUROSPEECH 2001 Proceedings. Aalborg, Denmark, pp. 209-212.
Young, S. (1997). HTKBook-Version 2.1, Cambridge, UK: Entropic Cambridge Research Laboratory.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia
Bojan Kotnik, Damjan Vlaj & Bogomir Horvat

Authors

Bojan Kotnik
View author publications
You can also search for this author in PubMed Google Scholar
Damjan Vlaj
View author publications
You can also search for this author in PubMed Google Scholar
Bogomir Horvat
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kotnik, B., Vlaj, D. & Horvat, B. Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems. International Journal of Speech Technology 6, 205–219 (2003). https://doi.org/10.1023/A:1023410018862

Download citation

Issue Date: July 2003
DOI: https://doi.org/10.1023/A:1023410018862

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Chinese dialect speech recognition: a comprehensive survey

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Chinese dialect speech recognition: a comprehensive survey

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation