Spectral Analysis for Automatic Speech Recognition and Enhancement

Oruh, Jane; Viriri, Serestina

doi:10.1007/978-3-030-70866-5_16

Spectral Analysis for Automatic Speech Recognition and Enhancement

Jane Oruh¹¹ &
Serestina Viriri¹¹

Conference paper
First Online: 03 March 2021

780 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12629))

Abstract

Accurate recognition of noisy speech signal is still an obstacle for wider application of speech recognition technology. The robustness of a speech recognition system is heavily influenced by the ability to handle the presence of background noise. In this work, a Short Time Fourier Transform (STFT) filtering technique for the enhancement and recognition of the speech signal is presented. Conventionally, STFT filtering has been applied in speech analysis. However, in this study the combination of modified STFT with Adaptive window width based on the Chirp Rate, termed ASTFT, in conjunction with Spectrogram Features is proposed for optimal speech recognition and enhancement. LibriSpeech ASR Corpus is the benchmark dataset for this experiment. The spectrum from the enhanced Speech signal is estimated using several spectrogram features to obtain a unit peak amplitude. Priori Signal-to-Noise Ratio (SNR) estimation is performed on the modified STFT speech signal, and it achieved an SNR of 31.86 dB which is considered to be an effectively clean speech signal.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Nasreen, P.N., Kumar, A.C., Nabeel, P.A.: Speech analysis for automatic speech recognition. In: Proceedings of International Conference on Computing, Communication and Science (2016)
Google Scholar
Delcroix, M., et al.: Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge. In: Reverb workshop (2014)
Google Scholar
Cohen, I., Benesty, J., Gannot, S.: Speech Processing in Modern Communication: Challenges and Perspectives, vol. 3. Springer Science & Business Media, Berlin (2009)
MATH Google Scholar
Parchami, M., Zhu, W.-P., Champagne, B., Plourde, E.: Recent developments in speech enhancement in the short-time Fourier transform domain. IEEE Circ. Syst. Mag. 16(3), 45–77 (2016)
Article Google Scholar
Kwok, H.K., Jones, D.L.: Improved instantaneous frequency estimation using an adaptive short-time Fourier transform. IEEE Trans. Sig. Process. 48(10), 2964–2972 (2000)
Article MathSciNet Google Scholar
Zhong, J., Huang, Y.: Time-frequency representation based on an adaptive short-time Fourier transform. IEEE Trans. Sig. Process. 58, 5118–5128 (2010)
Article MathSciNet Google Scholar
Toledano, D.T., Fernández-Gallego, M.P., Lozano-Diez, A.: Multi-resolution speech analysis for automatic speech recognition using deep neural networks: experiments on TIMIT. PloS one 13(10), e0205355 (2018)
Article Google Scholar
Tüske, Z., Golik, P., Schlüter, R., Drepper, F.R.: Non-stationary feature extraction for automatic speech recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5204–5207. IEEE (2011)
Google Scholar
Parchami, M.: New Approaches for Speech Enhancement in the Short-Time Fourier Transform Domain. PhD thesis, Concordia University (2016)
Google Scholar
Ahmadizadeh, M.: An Introduction to Short-Time Fourier Transform (STFT). Advanced Structural Dynamics, April 2014
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3 (2014)
Google Scholar
Solovyev, R.A., et al.: Deep learning approaches for understanding simple speech commands. In: 2020 IEEE 40th International Conference on Electronics and Nanotechnology (ELNANO), pp. 688–693. IEEE (2020)
Google Scholar
Paliwal, K.K., Alsteris, L.D.: On the usefulness of STFT phase spectrum in human listening tests. Speech Communi. 45(2), 153–170 (2005)
Article Google Scholar
Dutta, A., Valiveti, G.R.S.: Enhancing the performance of audio visual speech recognition using deep learning techniques. Int. J. Comput. Sci. Commun. 7(2), 126–135 (2016)
Google Scholar
Creative Commons. Creative Commons Attribution 4.0 International (CC BY 4.0) License. https://creativecommons.org/licenses/by/4.0/. Accessed 07 Nov 2017
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Google Scholar
Sarma, P., Sarmah, S., Bhuyan, M.P., Hore, K., Das, P.P.: Automatic spoken digit recognition using artificial neural network. Int. J. Sci. Technol. Res. 8(12), 1400–1404 (2019)
Google Scholar
Gutierrez-Osuna, R.: Introduction to speech processing. CSE@ TAMU (2016)
Google Scholar
Pei, S.-C., Huang, S.-G.: STFT with adaptive window width based on the chirp rate. IEEE Trans. Sig. Process. 60, 4065–4080 (2012)
Article MathSciNet Google Scholar
Czerwinski, R.N., Jones, D.L.: Adaptive short-time Fourier analysis. IEEE Sig. Process. Lett. 4(2), 42–45 (1997)
Article Google Scholar
McFee, B., et al.: Librosa: v0.4.0.Zenodo. In: Proceedings of the 14th Python in Science Conference (SCIPY 2015) (2015)
Google Scholar
Singh, J., Kaur, K.: Speech enhancement for Punjabi language using deep neural network. In: 2019 International Conference on Signal Processing and Communication (ICSC), pp. 202–204. IEEE (2019)
Google Scholar
F. A. Q. International Computer Science Institute (ICSI) Speech. https://www1.icsi.berkeley.edu/Speech/faq/speechSNR.html. Accessed 17 Sep 2019
Athaley, P.D.A.: Audio signal denoising algorithm by adaptive block thresholding using STFT. Int. J. Trend Sci. Res. Dev. 1(6), 289–300 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban, 4000, South Africa
Jane Oruh & Serestina Viriri

Authors

Jane Oruh
View author publications
You can also search for this author in PubMed Google Scholar
Serestina Viriri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Serestina Viriri .

Editor information

Editors and Affiliations

Laboratoire LIGM UMR 8049 CNRS, ESIEE Paris, Noisy-le-Grand, France
Éric Renault
CNAM/CEDRIC, Paris, France
Selma Boumerdassi
Inria/EVA Project, Paris, France
Paul Mühlethaler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oruh, J., Viriri, S. (2021). Spectral Analysis for Automatic Speech Recognition and Enhancement. In: Renault, É., Boumerdassi, S., Mühlethaler, P. (eds) Machine Learning for Networking. MLN 2020. Lecture Notes in Computer Science(), vol 12629. Springer, Cham. https://doi.org/10.1007/978-3-030-70866-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-70866-5_16
Published: 03 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70865-8
Online ISBN: 978-3-030-70866-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics