Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction

Dash, Tusar Kanti; Solanki, Sandeep Singh

doi:10.1007/s11277-019-06902-0

Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction

Published: 24 October 2019

Volume 111, pages 1073–1087, (2020)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

275 Accesses
7 Citations
Explore all metrics

Abstract

In contrast to the adverse environments, performances of existing speech enhancement algorithms do not always produce satisfactory results. In the case of worst signal to noise ratio, the processing is complicated and it may introduce signal distortions and degradation of intelligibility. To overcome the complexity of the existing speech enhancement algorithms, a hybrid concept for enhancing the speech quality and intelligibility is proposed in this research. The primary objectives of the research work is to increase the intelligibility of the speech enhancement system that has been trained for a particular speech signal using modified deep neural network (DNN) and adaptive multi-band spectral subtraction (AdMBSS). In this work, AdMBSS is used for enhancing the intelligibility of the speech signal using the additional phase information calculation, and finally, hybrid DNN and Nelder Mead optimization is utilized to improve the signal quality. Experimental results explain that the proposed framework achieves improved performance in signal to noise ratio, perceptual evaluation of signal quality and minimum mean square error. Finally, performances are taken for the more noises like bus noise, train noise, babble noise, airport noise, station noise and exhibition noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Amandeep Singh Dhanjal & Williamjeet Singh

A Deep Learning Framework for Audio Deepfake Detection

Article 08 November 2021

Janavi Khochare, Chaitali Joshi, … Faruk Kazi

References

Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication,49(7), 588–601.
Article Google Scholar
Loizou, P. (2017). NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.
Google Scholar
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing,16(1), 229–238.
Article Google Scholar
Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing,13(5), 845–856.
Article Google Scholar
Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Applied Signal Processing,2005, 1110–1126.
MATH Google Scholar
Loizou, P. C. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Transactions on Speech and Audio Processing,13(5), 857–869.
Article Google Scholar
Loizou, P. C., & Kim, G. (2011). Reasons why current speech-enhancement algorithms do not improve speech intelligibility and) suggested solutions. IEEE Transactions on Audio, Speech and Language Processing,19(1), 47–56.
Article Google Scholar
Cohen, I. (2005). Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing,13(5), 870–881.
Article Google Scholar
Ghanbari, Y., & Karami-Mollaei, M. R. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication,48(8), 927–940.
Article Google Scholar
Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech and Language Processing,21(10), 2140–2151.
Article Google Scholar
Cohen, I. (2005). Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation. Speech Communication,47(3), 336–350.
Article Google Scholar
Skowronski, M. D., & Harris, J. G. (2006). Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments. Speech Communication,48(5), 549–558.
Article Google Scholar
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech and Language Processing,19(7), 2125–2136.
Article Google Scholar
Shao, Y., & Chang, C.-H. (2007). A generalized time–frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of the human auditory system. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics),37(4), 877–889.
Article Google Scholar
Lu, Y., & Cooke, M. (2009). The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise. Speech Communication,51(12), 1253–1262.
Article Google Scholar
Hansen, J. H., Radhakrishnan, V., & Arehart, K. H. (2006). Speech enhancement based on generalized minimum mean square error estimators and masking properties of the auditory system. IEEE Transactions on Audio, Speech and Language Processing,14(6), 2049–2063.
Article Google Scholar
Taghia, J., & Martin, R. (2014). Objective intelligibility measures based on mutual information for speech subjected to speech enhancement processing. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),22(1), 6–16.
Article Google Scholar
Xu, Y., Du, J., Dai, L.-R., & Lee, C.-H. (2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP),23(1), 7–19.
Article Google Scholar
Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio, Speech and Language Processing,18(8), 2080–2090.
Article Google Scholar
Jokinen, E., Takanen, M., Vainio, M., & Alku, P. (2014). An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech. Computer Speech & Language,28(2), 619–628.
Article Google Scholar
Petkov, P. N., Henter, G. E., & Kleijn, W. B. (2013). Maximizing phoneme recognition accuracy for enhanced speech intelligibility in noise. IEEE Transactions on Audio, Speech and Language Processing,21(5), 1035–1045.
Article Google Scholar
Tsao, Yu., & Lai, Y.-H. (2016). Generalized maximum a posteriori spectral amplitude estimation for speech enhancement. Speech Communication,76, 112–126.
Article Google Scholar
Chen, F. (2016). Predicting the intelligibility of noise-corrupted speech non-intrusively by across-band envelope correlation. Biomedical Signal Processing and Control,24, 109–113.
Article Google Scholar
Zorilă, T.-C., Stylianou, Y., Ishihara, T., & Akamine, M. (2016). Near and far field speech-in-noise intelligibility improvements based on a time–frequency energy reallocation approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing,24(10), 1808–1818.
Article Google Scholar
Goehring, T., Bolner, F., Monaghan, J. J. M., Dijk, B. V., Zarowski, A., & Bleeck, S. (2017). Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users. Hearing Research,344, 183–194.
Article Google Scholar
Kolbæk, M., Tan, Z.-H., & Jensen, J. (2017). Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing,25(1), 153–167.
Article Google Scholar
Loizou, P. C. (2013). Speech enhancement: Theory and practice. New York: CRC Press.
Book Google Scholar
Samui, S., Chakrabarti, I., & Ghosh, S. K. (2016). Improved single channel phase-aware speech enhancement technique for low signal-to-noise ratio signal. IET Signal Processing,10(6), 641–650.
Article Google Scholar
Ozaki, Y., Yano, M., & Onishi, M. (2017). Effective hyperparameter optimization using Nelder-Mead method in deep learning. IPSJ Transactions on Computer Vision and Applications, 9, 20.
Article Google Scholar
Hori, T., Chen, Z., Erdogan, H., Hershey, J. R., Roux, J. L., Mitra, V., et al. (2017). Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. Berlin: Springer.
Book Google Scholar
An efficient MFCC extraction method in speech recognition
Xu, Y., Du, J., Dai, L.-R., & Lee, C.-H. (2014). Global variance equalization for improving deep neural network based speech enhancement. In IEEE China summit & international conference on signal and information processing (China SIP), 2014 (pp. 71–75). IEEE.
Zue, V., Seneff, S., & Glass, J. (1990). Speech database development at MIT: Timit and beyond. Speech Communication, 9(4), 351–356.
Article Google Scholar
Kavalekalam, S. M., Christensen, M. G., Gran, F., & Boldt, J. B. (2016). Kalman filter for speech enhancement in cocktail party scenarios using a codebook-based approach. In IEEE international conference on acoustics, speech and signal processing (ICASSP), 2016 (pp. 191–195). IEEE.
Kirubagari, B., Palanivel, S., & Subathra, N. (2014). Speech enhancement using minimum mean square error filter and spectral subtraction filter. In International conference on information communication and embedded systems (ICICES), 2014 (pp. 1–7). IEEE.
Hu, Y., & Loizou, P. C. (2003). A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing, 11(4), 334–341.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Article Google Scholar
Liu, B., Tao, J., Wen, Z., & Mo, F. (2016). Speech enhancement based on analysis-synthesis framework with improved parameter domain enhancement. Journal of Signal Processing Systems,82(2), 141–150.
Article Google Scholar
Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR2000-automatic speech recognition: Challenges for the new Millenium ISCA tutorial and research workshop (ITRW).
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752).

Download references

Author information

Authors and Affiliations

Electronics and Communication Engineering, Birla Institute of Technology, Mesra, India
Tusar Kanti Dash & Sandeep Singh Solanki
Electronics and Telecom Engineering, CV Raman College of Engineering, Bhubaneswar, India
Tusar Kanti Dash

Authors

Tusar Kanti Dash
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Singh Solanki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tusar Kanti Dash.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dash, T.K., Solanki, S.S. Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction. Wireless Pers Commun 111, 1073–1087 (2020). https://doi.org/10.1007/s11277-019-06902-0

Download citation

Published: 24 October 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11277-019-06902-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech Intelligibility Based Enhancement System Using Modified Deep Neural Network and Adaptive Multi-band Spectral Subtraction

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation