Noise-robust speech recognition in mobile network based on convolution neural networks

Bouchakour, Lallouani; Debyeche, Mohamed

doi:10.1007/s10772-021-09950-9

Noise-robust speech recognition in mobile network based on convolution neural networks

Published: 13 January 2022

Volume 25, pages 269–277, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

359 Accesses
1 Citation
Explore all metrics

Abstract

The performance of Continuous Automatic Speech Recognition Systems (CASRS) in networks communications degrades rapidly in the presence of speech signal variability such as noisy environment, channel communication, and speech codec. There are several techniques proposed to improve recognition accuracy. The ASR consists of two main processing steps: feature extraction (Front-End) and classification (Back-End). We are motivated to develop speech separation algorithms (feature enhancement) to improve the intelligibility of noisy speech and the accuracy of ASR. We use non-negative matrix factorization and Ideal Binary Mask, which are estimated by a deep neural network (DNN) to use the Spectro-temporal structures of magnitude spectrograms for supervised speech separation. The ASR is based on the convolution neural network where the input is the Log Mel Cepstrum features. The system was trained using 440 sentences of 20 speakers encoded AMR-NB database and contaminated with various levels of signal-to-noise ratio (0 dB, 5 dB and 10 dB).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Automatic speech recognition: a survey

Article 10 November 2020

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

References

Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533–1545. https://doi.org/10.1109/TASLP.2014.2339736
Article Google Scholar
Addou, D., Selouani, S. A., Kifaya, K., Boudraa, M., & Boudraa, B. (2007). A noise-robust front-end for distributed speech recognition in mobile communications. International Journal of Speech Technology, 10(4), 167–173. https://doi.org/10.1007/s10772-009-9025-9
Article Google Scholar
Bouchakour, L., & Debyeche, M. (2018). Improving continuous Arabic speech recognition over mobile networks DSR and NSR using MFCCS features transformed. International Journal of Circuits, Systems and Signal Processing, 12, 1–8.
Google Scholar
Chang, S. Y., & Morgan, N. (2014). Robust CNN-based speech recognition with Gabor filter kernels. In Fifteenth annual conference of the International Speech Communication Association.
Choi, W., Park, S., Han, D. K., & Ko, H. (2015). Acoustic event recognition using dominant spectral basis vectors. In Sixteenth annual conference of the International Speech Communication Association (INTERSPEECH 2015)
Ciaburro, G., & Venkateswaran, B. (2017). Neural networks with R: Smart models using CNN, RNN, deep learning, and artificial intelligence principles. Packt Publishing Ltd.
Google Scholar
Deng, L., Droppo, J., & Acero, A. (2004). Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Transactions on Speech and Audio Processing, 12(2), 133–143. https://doi.org/10.1109/TSA.2003.820201
Article Google Scholar
Djamel, A., & Sid-Ahmed, S. (2015). Optimisation of multiple feature stream weights for distributed speech processing in mobile environments. IET Signal Processing, 9(4), 387–394.
Article Google Scholar
Han, W., Chan, C. F., Choy, C. S., & Pun, K. P. (2006, May). An efficient Mfcc extraction method in speech recognition. In 2006 IEEE international symposium on circuits and systems (p. 4). https://doi.org/10.1109/ISCAS.2006.1692543
Holmes, W. (2001). Speech synthesis and recognition. CRC Press.
Google Scholar
Ittichaichareon, C., Suksri, S., & Yingthawornsuk, T. (2012, July). Speech recognition using MFCC. In International conference on computer graphics, simulation and modeling (pp. 135–138).
Kolossa, D., & Haeb-Umbach, R. (Eds.). (2011). Robust speech recognition of uncertain or missing data: Theory and applications. Springer. https://doi.org/10.1007/978-3-642-21317-5
Narayanan, A., & Wang, D. (2013, May). Ideal ratio mask estimation using deep neural networks for robust speech recognition. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7092–7096)
Narayanan, A., & Wang, D. (2014). Investigation of speech separation as a front-end for noise robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 826–835.
Article Google Scholar
Nie, S., Liang, S., Liu, W., Zhang, X., & Tao, J. (2018). Deep learning based speech separation via Nmf-style reconstructions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11), 2043–2055.
Article Google Scholar
Peinado, A., & Segura, J. (2006). Speech recognition over digital channels: Robustness and standards. Wiley.
Book Google Scholar
Rennie, S. J., Hershey, J. R., & Olsen, P. A. (2008, March). Efficient model-based speech separation and denoising using non-negative subspace analysis. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 1833–1836). IEEE.
Rohlfing, C., Becker, J. M., & Wien, M. (2016, March). Nmf-based informed source separation. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 474–478). IEEE.
Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In Ninth international conference on spoken language processing.
Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074. https://doi.org/10.1109/TASL.2006.885253
Article Google Scholar
Virtanen, T., Singh, R., & Raj, B. (Eds.). (2012). Techniques for noise robustness in automatic speech recognition (pp. 251–322). Wiley.
Wang, D. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In Speech separation by humans and machines (pp. 181–197). Springer. https://doi.org/10.1007/0-387-22794-6_12
Wang, Y., Narayanan, A., & Wang, D. (2014). On training targets for supervised speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 1849–1858. https://doi.org/10.1109/TASLP.2014.2352935
Article Google Scholar
Weiss, R. J., & Ellis, D. P. (2006). Estimating single-channel source separation masks: Relevance vector machine classifiers vs. pitch-based masking. In Proceedings of ISCA tutorial and research workshop on statistical and perceptual audition (SAPA) (pp. 31–36). https://doi.org/10.7916/D83F501S

Download references

Author information

Authors and Affiliations

Speech Communication and Signal Processing Laboratory, Université des Sciences et de la Technologie Houari Boumediene (USTHB), Algiers, Algeria
Lallouani Bouchakour & Mohamed Debyeche

Authors

Lallouani Bouchakour
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Debyeche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lallouani Bouchakour.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouchakour, L., Debyeche, M. Noise-robust speech recognition in mobile network based on convolution neural networks. Int J Speech Technol 25, 269–277 (2022). https://doi.org/10.1007/s10772-021-09950-9

Download citation

Received: 12 July 2020
Accepted: 13 November 2021
Published: 13 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10772-021-09950-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise-robust speech recognition in mobile network based on convolution neural networks

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Noise-robust speech recognition in mobile network based on convolution neural networks

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation