Skip to main content

Spectral Analysis for Automatic Speech Recognition and Enhancement

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12629))

Abstract

Accurate recognition of noisy speech signal is still an obstacle for wider application of speech recognition technology. The robustness of a speech recognition system is heavily influenced by the ability to handle the presence of background noise. In this work, a Short Time Fourier Transform (STFT) filtering technique for the enhancement and recognition of the speech signal is presented. Conventionally, STFT filtering has been applied in speech analysis. However, in this study the combination of modified STFT with Adaptive window width based on the Chirp Rate, termed ASTFT, in conjunction with Spectrogram Features is proposed for optimal speech recognition and enhancement. LibriSpeech ASR Corpus is the benchmark dataset for this experiment. The spectrum from the enhanced Speech signal is estimated using several spectrogram features to obtain a unit peak amplitude. Priori Signal-to-Noise Ratio (SNR) estimation is performed on the modified STFT speech signal, and it achieved an SNR of 31.86 dB which is considered to be an effectively clean speech signal.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Nasreen, P.N., Kumar, A.C., Nabeel, P.A.: Speech analysis for automatic speech recognition. In: Proceedings of International Conference on Computing, Communication and Science (2016)

    Google Scholar 

  2. Delcroix, M., et al.: Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge. In: Reverb workshop (2014)

    Google Scholar 

  3. Cohen, I., Benesty, J., Gannot, S.: Speech Processing in Modern Communication: Challenges and Perspectives, vol. 3. Springer Science & Business Media, Berlin (2009)

    MATH  Google Scholar 

  4. Parchami, M., Zhu, W.-P., Champagne, B., Plourde, E.: Recent developments in speech enhancement in the short-time Fourier transform domain. IEEE Circ. Syst. Mag. 16(3), 45–77 (2016)

    Article  Google Scholar 

  5. Kwok, H.K., Jones, D.L.: Improved instantaneous frequency estimation using an adaptive short-time Fourier transform. IEEE Trans. Sig. Process. 48(10), 2964–2972 (2000)

    Article  MathSciNet  Google Scholar 

  6. Zhong, J., Huang, Y.: Time-frequency representation based on an adaptive short-time Fourier transform. IEEE Trans. Sig. Process. 58, 5118–5128 (2010)

    Article  MathSciNet  Google Scholar 

  7. Toledano, D.T., Fernández-Gallego, M.P., Lozano-Diez, A.: Multi-resolution speech analysis for automatic speech recognition using deep neural networks: experiments on TIMIT. PloS one 13(10), e0205355 (2018)

    Article  Google Scholar 

  8. Tüske, Z., Golik, P., Schlüter, R., Drepper, F.R.: Non-stationary feature extraction for automatic speech recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5204–5207. IEEE (2011)

    Google Scholar 

  9. Parchami, M.: New Approaches for Speech Enhancement in the Short-Time Fourier Transform Domain. PhD thesis, Concordia University (2016)

    Google Scholar 

  10. Ahmadizadeh, M.: An Introduction to Short-Time Fourier Transform (STFT). Advanced Structural Dynamics, April 2014

    Google Scholar 

  11. Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3 (2014)

    Google Scholar 

  12. Solovyev, R.A., et al.: Deep learning approaches for understanding simple speech commands. In: 2020 IEEE 40th International Conference on Electronics and Nanotechnology (ELNANO), pp. 688–693. IEEE (2020)

    Google Scholar 

  13. Paliwal, K.K., Alsteris, L.D.: On the usefulness of STFT phase spectrum in human listening tests. Speech Communi. 45(2), 153–170 (2005)

    Article  Google Scholar 

  14. Dutta, A., Valiveti, G.R.S.: Enhancing the performance of audio visual speech recognition using deep learning techniques. Int. J. Comput. Sci. Commun. 7(2), 126–135 (2016)

    Google Scholar 

  15. Creative Commons. Creative Commons Attribution 4.0 International (CC BY 4.0) License. https://creativecommons.org/licenses/by/4.0/. Accessed 07 Nov 2017

  16. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

    Google Scholar 

  17. Sarma, P., Sarmah, S., Bhuyan, M.P., Hore, K., Das, P.P.: Automatic spoken digit recognition using artificial neural network. Int. J. Sci. Technol. Res. 8(12), 1400–1404 (2019)

    Google Scholar 

  18. Gutierrez-Osuna, R.: Introduction to speech processing. CSE@ TAMU (2016)

    Google Scholar 

  19. Pei, S.-C., Huang, S.-G.: STFT with adaptive window width based on the chirp rate. IEEE Trans. Sig. Process. 60, 4065–4080 (2012)

    Article  MathSciNet  Google Scholar 

  20. Czerwinski, R.N., Jones, D.L.: Adaptive short-time Fourier analysis. IEEE Sig. Process. Lett. 4(2), 42–45 (1997)

    Article  Google Scholar 

  21. McFee, B., et al.: Librosa: v0.4.0.Zenodo. In: Proceedings of the 14th Python in Science Conference (SCIPY 2015) (2015)

    Google Scholar 

  22. Singh, J., Kaur, K.: Speech enhancement for Punjabi language using deep neural network. In: 2019 International Conference on Signal Processing and Communication (ICSC), pp. 202–204. IEEE (2019)

    Google Scholar 

  23. F. A. Q. International Computer Science Institute (ICSI) Speech. https://www1.icsi.berkeley.edu/Speech/faq/speechSNR.html. Accessed 17 Sep 2019

  24. Athaley, P.D.A.: Audio signal denoising algorithm by adaptive block thresholding using STFT. Int. J. Trend Sci. Res. Dev. 1(6), 289–300 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serestina Viriri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oruh, J., Viriri, S. (2021). Spectral Analysis for Automatic Speech Recognition and Enhancement. In: Renault, É., Boumerdassi, S., Mühlethaler, P. (eds) Machine Learning for Networking. MLN 2020. Lecture Notes in Computer Science(), vol 12629. Springer, Cham. https://doi.org/10.1007/978-3-030-70866-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-70866-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-70865-8

  • Online ISBN: 978-3-030-70866-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics