Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder

Raj, Shivangi; Prakasam, P.; Gupta, Shubham

doi:10.1007/s10772-021-09809-z

Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder

Published: 28 January 2021

Volume 24, pages 425–437, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

387 Accesses
2 Citations
Explore all metrics

Abstract

In this research article, a multi-layered convolutional neural network (MLCNN) based auto-CODEC for audio signal enhancement which is utilizing the Mel-frequency cepstral coefficients (MFCC) has been proposed. The proposed MLCNN takes the input as MFCC with different frames from the noise contaminated audio signal for training and testing. The proposed MLCNN models has been trained and tested as 80:20 and 70:30 ratios from the available database. The proposed method has been verified and validated MNIST database. From the validation it has been found that the proposed MLCNN model provides an accuracy of 93.25%. The performance of MLCNN has been evaluated using short-time objective intelligibility, perceptual evaluation of speech quality and Cosine similarities. The proposed MLCNN model has been compared with the reported models. Form the comparisons; it has been observed that the proposed MLCNN model outperforms other models. From the cosine similarity, it has been proved that MLCNN provides high security level which can be used for many secure applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfake: An Overview

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

References

Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. In Proceedings of the IEEE international conference on engineering and technology, https://doi.org/10.1109/ICEngTechnol.2017.8308186.
Ali, A. (2019). Impulse noise reduction in audio signal through multi-stage technique. Eng. Sci. Technol. Int. J., 22(2), 629–636.
Google Scholar
Ali, M. A., & Shemi, P. M. (2015). An improved method of audio denoising based on wavelet transform. In Proceedings of the IEEE international conference on power, instrumentation, control and computing, 1–6.
Candes, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 11:01-11:37.
Article MathSciNet Google Scholar
Chandra, B., & Sharma, R. K. (2014). Adaptive noise schedule for denoising autoencoder. In Neural information processing. ICONIP 2014. Lecture Notes in Computer Science, 8834, 535–542.
Chen, Z., Watanabe, S., Erdogan, H., & Hershey, J. R. (2015). Speech enhancement and recognition using multi-task learning of long short term memory recurrent neural networks. In Proceedings of the 16th Annual Conference of the International Speech Communication Association, 3274–3278.
Chin, Y. H., Wang, J. C., Huang, C. L., Wang, K. Y., & Wu, C. H. (2017). Speaker identification using discriminative features and sparse representation. IEEE Transactions on Information Forensics and Security, 12, 1979–1987.
Article Google Scholar
Davoudabadi, M. J., & Aminghafari, M. (2017). A fuzzy-wavelet denoising technique with applications to noise reduction in audio signals. Journal of Intelligent & Fuzzy Systems, 33(4), 2159–2169.
Article Google Scholar
Fu, S. W., Wang, T. W., Tsao, Y., Lu, X., & Kawai, H. (2018). End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, 26(9), 1570–1584.
Article Google Scholar
Michelashvili, M., & Wolf, L. (2019). 2019. CoRR: Audio Denoising with Deep Network Priors.
Google Scholar
Pandey, A., & Wang, D. (2019). A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Transactions on Audio, Speech and Language Processing, 27(7), 1179–1188.
Article Google Scholar
Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: Speech enhancement generative adversarial network. In Proceedings of INTERSPEECH, 3642–3646.
Pohjalainen, J., Ringeval, F., Zhang, Z., & Schuller, B. (2016). Spectral and cepstral audio noise reduction techniques in speech emotion recognition. In Proceedings of the 24th ACM International Conference on Multimedia, 670–674
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2, 749–752.
Google Scholar
Shivakumar, P. G., & Georgiou, P. G. (2016). Perception optimized deep denoising autoencoders for speech enhancement. In Proc. INTERSPEECH, 3743–3747.
Sun, L., Du, J., Dai, L., & Lee, C. (2017). Multiple-target deep learning for LSTM-RNN based speech enhancement. In Proceedings of the hands-free speech communications and microphone arrays conference, 136–140.
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2125–2136.
Article Google Scholar
Tan, K., & Wang, D. (2019). Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 380–390.
Article Google Scholar
Thiruvengadam. (2017). Speech/music classification using MFCC and KNN. International Journal of Computational Intelligence Research, 13(10), 2449–2452.
Google Scholar
Tiwari, V. (2010). MFCC and its applications in speaker recognition. International Journal on Emerging Technologies, 1(1), 19–22.
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the International Conference on Machine Learning, 1096–1103.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.
MathSciNet MATH Google Scholar
Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10), 1702–1726.
Article Google Scholar
Wang, J. C., Lee, Y. S., Lin, C. H., Wang, S. F., Shih, C. H., & Wu, C. H. (2016). Compressive sensing-based speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11), 2122–2131.
Article Google Scholar
Welk, M., Bergmeister, A., & Weickert, J. (2015). Denoising of audio data by nonlinear diffusion. In Scale space and PDE methods in computer vision. Lecture notes in computer science, 3459, 598–609.
Wilson, K. W., Raj, B., Smaragdis, P., & Divakaran, A. (2009). Speech denoising using nonnegative matrix factorization with priors. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 4029–4032.
Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). An experimental study on speech enhancement based on deep neural networks. IEEE Signal Processing Letters, 21(1), 65–68.
Article Google Scholar
Yildirim, O., Tan, R. S., & Acharya, U. R. (2018). An efficient compression of ECG signals using deep convolutional autoencoders. Cognitive Systems Research, 53, 198–211.
Article Google Scholar
Yu, G., Bacry, E., & Mallat, S. (2007). Audio signal denoising with complex wavelets and adaptive block attenuation. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, 863–869.
Zhao, Z., Liu, H., & Fingscheidt, T. (2019). Convolutional neural networks to enhance coded speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 663–678.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
Shivangi Raj, P. Prakasam & Shubham Gupta

Authors

Shivangi Raj
View author publications
You can also search for this author in PubMed Google Scholar
P. Prakasam
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Prakasam.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper and that the work presented in this article is not supported by any funding agency.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raj, S., Prakasam, P. & Gupta, S. Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder. Int J Speech Technol 24, 425–437 (2021). https://doi.org/10.1007/s10772-021-09809-z

Download citation

Received: 24 July 2020
Accepted: 06 January 2021
Published: 28 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10772-021-09809-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation