Abstract
The accuracy of voice or speech recognition is affected due to the presence of various background noises present in the surroundings. Automatic Speech Recognition communication systems are utilized for enhancing the speech by either reducing or eliminating the surrounding noises. The corrupted speech signals are enhanced by using different techniques. In this paper, Recurrent Convolutional Encoder-Decoder (R-CED) network is proposed for enhancing the speech by the elimination of noise signals. The efficiency of the proposed work is determined by comparing the performance metrics like PESQ, STOI and CER with various existing techniques. From the results obtained, it can be confirmed that the efficiency of proposed R-CED is higher and optimal when compared to the existing techniques.







Similar content being viewed by others
References
Zhao, H., Zarar, S., Tashev, I., Lee, C.-H. (2018). Convolutional-recurrent neural networks for speech enhancement. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2401–2405.
Liu, H.-P., Tsao, Y., & Fuh, C.-S. (2018). Bone-conducted speech enhancement using deep denoisingautoencoder. Speech Communication, 104, 106–112.
Tu, J., & Xia, Y. (2018). Effective Kalman filtering algorithm for distributed multichannel speech enhancement. Neurocomputing, 275, 144–154.
He, Q., Bao, F., & Bao, C. (2017). Multiplicative update of auto-regressive gains for codebook-based speech enhancement. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 25, 457–468.
Henni, R., Djendi, M., & Djebari, M. (2019). A new efficient two-channel fast transversal adaptive filtering algorithm for blind speech enhancement and acoustic noise reduction. Computers & Electrical Engineering, 73, 349–368.
Malathi, P., Suresh, G., Moorthi, M., Shanker, N. (2019). "Speech Enhancement via Smart Larynx of Variable Frequency for Laryngectomee Patient for Tamil Language Syllables Using RADWT Algorithm. Circuits, Systems, and Signal Processing, 1–27
Du, X., Zhu, M., Shi, X., Zhang, X., Zhang, W., Chen, J. (2019). End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking. arXiv preprint arXiv:1901.00295
Bendoumia, R. (2019). Two-channel forward NLMS algorithm combined with simple variable step-sizes for speech quality enhancement. Analog Integrated Circuits and Signal Processing, 98, 27–40.
Wang, Y., & Brookes, M. (2018). Model-based speech enhancement in the modulation domain. IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 26, 580–594.
Bando, Y., Mimura, M., Itoyama, K., Yoshii, K., Kawahara, T. (2018). Statistical speech enhancement based on probabilistic integration of variational autoencoder and non-negative matrix factorization. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 716–720.
Donahue, C., Li, B., Prabhavalkar, R. (2018). Exploring speech enhancement with generative adversarial networks for robust speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5024–5028.
Pascual, S., Park, M., Serrà, J., Bonafonte, A., Ahn, K.-H. (2018). Language and noise transfer in speech enhancement generative adversarial network. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5019–5023.
Xue, W., Moore, A. H., Brookes, M., Naylor, P.A. (2018). Modulation-Domain Parametric Multichannel Kalman Filtering for Speech Enhancement. In 2018 26th European Signal Processing Conference (EUSIPCO), pp. 2509–2513.
Leng, X., Chen, J., Benesty, J., Cohen, I. (2018). On Speech Enhancement Using Microphone Arrays in the Presence of Co-Directional Interference. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 511–515.
Bando, Y., Itoyama, K., Konyo, M., Tadokoro, S., Nakadai, K., Yoshii, K., et al. (2017). Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 215–230.
Pandey, A., Wang, D. (2018). On adversarial training and loss functions for speech enhancement. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5414–5418.
Baby, D. (2020). "isegan: Improved speech enhancement generative adversarial networks," arXiv preprint arXiv:2002.08796.
Xia, Y., Stern, R. (2018). A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement. In Interspeech, pp. 3274–3278.
Li, R., Sun, X., Li, T., & Zhao, F. (2020). A multi-objective learning speech enhancement algorithm based on IRM post-processing with joint estimation of SCNN and TCNN. Digital Signal Processing, 101, 102731.
Phan, H., McLoughlin, I. V., Pham, L., Chén, O. Y., Koch, P., De Vos, M., et al. (2020). Improving gans for speech enhancement. IEEE Signal Processing Letters, 27, 1700–1704.
Zhao, Y., Xu, B., Giri, R., Zhang, T. (2018). Perceptually guided speech enhancement using deep neural networks. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5074–5078.
Das, N., Chakraborty, S., Chaki, J., Padhy, N., Dey, N. (2020). Fundamentals, present and future perspectives of speech enhancement. International Journal of Speech Technology, 1–19
Kavalekalam, M. S., Nielsen, J. K., Boldt, J. B., & Christensen, M. G. (2018). Model-based speech enhancement for intelligibility improvement in binaural hearing aids. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27, 99–113.
Hussain, T., Siniscalchi, S. M., Lee, C.-C., Wang, S.-S., Tsao, Y., & Liao, W.-H. (2017). Experimental study on extreme learning machine applications for speech enhancement. IEEE Access, 5, 25542–25554.
Wolff, T., Matheja, T., Buck, M. (2019). System and method for speech enhancement using a coherent to diffuse sound ratio," ed: Google Patents.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Karthik, A., MazherIqbal, J.L. Efficient Speech Enhancement Using Recurrent Convolution Encoder and Decoder. Wireless Pers Commun 119, 1959–1973 (2021). https://doi.org/10.1007/s11277-021-08313-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-021-08313-6