Skip to main content

Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

Abstract

Spectro-temporal feature extraction and multi-band processing were both designed to make the speech recognizers more robust. Although they have been used for a long time now, very few attempts have been made to combine them. This is why here we integrate two spectro-temporal feature extraction methods into a multi-band framework. We assess the performance of our spectro-temporal feature sets both individually (as a baseline) and in combination with multi-band processing in phone recognition tasks on clean and noise contaminated versions of the TIMIT dataset. Our results show that multi-band processing clearly outperforms the baseline feature recombination method in every case tested. This improved performance can also be further enhanced by using the recently introduced technology of deep neural nets (DNNs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bourlard, H., Dupont, S.: A New ASR Approach Based on Independent Processing and Recombination of Partial Frequency Bands. In: ICSLP, pp. 426–429 (1996)

    Google Scholar 

  2. Hermansky, H., Timbrewala, S., Pavel, M.: Towards ASR On Partially Corrupted Speech. In: ICSLP, pp. 464–465 (1996)

    Google Scholar 

  3. Hagen, A., Morris, A., Bourlard, H.: Subband-Based Speech Recognition in Noisy Conditions The Full Combination Approach. Research Report, IDIAP (1998)

    Google Scholar 

  4. Janin, A., Ellis, D., Morgan, N.: Multi-Stream Speech Recognition: Ready for Prime Time? In: Eurospeech 1999, pp. 591–594 (1999)

    Google Scholar 

  5. Cerisara, C., Fohr, D.: Multi-band Automatic Speech Recognition. Computer Speech and Language 15, 151–174 (2001)

    Article  Google Scholar 

  6. Kovács, G., Tóth, L.: Phone Recognition Experiments with 2D DCT Spectro-temporal Features. In: SACI 2011, pp. 143–146 (2011)

    Google Scholar 

  7. Kleinschmidt, M., Gelbart, D.: Improving Word Accuracy with Gabor Feature Extraction. In: ICSLP, pp. 25–28 (2002)

    Google Scholar 

  8. Kovács, G., Tóth, L.: The Joint Optimization of Spectro-temporal Features and Neural Net Classifiers. In: TSD 2013, pp. 552–559 (2013)

    Google Scholar 

  9. Mesgarani, N., Thomas, S., Hermansky, H.: A multistream multiresolution framework for phoneme recognition. In: INTERSPEECH, pp. 318–321 (2010)

    Google Scholar 

  10. Zhao, S.Y., Ravuri, S.V., Morgan, N.: Multi-stream to many-stream: using spectro-temporal features for ASR. In: INTERSPEECH, pp. 2951–2954 (2009)

    Google Scholar 

  11. Hinton, G., et al.: Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine 29, 82–97 (2012)

    Article  Google Scholar 

  12. Okawa, S., Bocchieri, E., Potamianos, A.: Multi-band Speech Recognition in Noisy Environments. In: ICASSP, pp. 641–644 (1998)

    Google Scholar 

  13. Hagen, A., Morris, A., Bourlard, H.: From Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR. ISCA ASR2000 Tutorial and Research Workshop (2000)

    Google Scholar 

  14. Hagen, A., Bourlard, H., Morris, A.: Adaptive ML-Weighting in Multi-Band Recombination of Gaussian Mixture ASR. In: ICASSP, pp. 257–260 (2001)

    Google Scholar 

  15. Morris, A., Hagen, A., Glotin, H., Bourlard, H.: Multi-stream adaptive evidence combination for noise robust ASR. Speech Communication 34, 25–40 (2001)

    Article  MATH  Google Scholar 

  16. Hagen, A., Neto, J.P.: Multi-stream Processing Using Context-independent and Context-dependent Hybrid Systems. In: ICASSP, pp. 277–280 (2003)

    Google Scholar 

  17. Lamel, L.F., Kassel, R., Seneff, S.: Speech database development: Design and analysis of the acoustic-phonetic corpus. In: DARPA Speech Recognition Workshop, pp. 100–109 (1986)

    Google Scholar 

  18. Young, S.J., et al.: The HTK book version 3.4. Cambridge University Press, Cambridge (2006)

    Google Scholar 

  19. Hagen, A., Morris, A., Bourlard, H.: From Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR. In: ISCA ASR2000 Tutorial and Research Workshop (2000)

    Google Scholar 

  20. Varga, A., Steeneken, H.: Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 12, 247–251 (1993)

    Article  Google Scholar 

  21. Hirsch, H.-G.: FaNT: Filtering and Noise-Adding Tool Retrieved March 22 (2010), http://dnt.kr.hs-niederrhein.de/download.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kovács, G., Tóth, L., Grósz, T. (2014). Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_48

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics