Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing

Yadava, Thimmaraja G.; Jayanna, H. S.

doi:10.1007/s10772-018-9506-9

Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing

Published: 10 April 2018

Volume 22, pages 639–648, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Thimmaraja G. Yadava¹ &
H. S. Jayanna²

396 Accesses
21 Citations
Explore all metrics

Abstract

Speech data collected under uncontrolled environment need to be processed to build a robust automatic speech recognition system. In this paper, a method is proposed to process the degraded speech signal. Initially, the significance of the spectral subtraction with voice activity detection (SS-VAD) and magnitude squared spectrum estimators are studied for different types of noises. In SS-VAD method, the degraded speech data is sampled and windowed into 50% overlapping. The VAD is used to detect the voiced regions of speech signal. The minimum mean square error-short time power spectrum, minimum mean square error-spectrum power based on zero crossing (MMSE-SPZC) and maximum a posteriori estimators are studied individually. These MSS estimators are implemented on the assumption that the magnitude squared spectrum of the degraded speech signal is the sum of the clean (original) speech signal and noise model. The experimental results show that the MMSE-SPZC estimator gives better performance compared to the other two methods. This estimator is combined with SS-VAD method to improve the performance. In this paper, the combined SS-VAD and MMSE-SPZC method, yields better speech quality by reducing noise in degraded speech signal compared to the individual methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Article 15 April 2024

References

Beh, J., & Ko, H. (2003). A novel spectral subtraction scheme for robust speech recognition: Spectral subtraction using spectral harmonics of speech. In IEEE international conference on multimedia and expo, New York (Vol. 3, pp. I-648–I-651).
Berouti, M., Schwartz M., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of IEEE International conference on acoustics, speech and signal processing, Washington DC (pp. 208–211).
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 27, 113–120.
Article Google Scholar
Brungart, D. S., Chang, P. S., Simpson, B. D., & Wang, D. (2006). Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. The Journal of the Acoustical Society of America, 120(6), 4007–4018.
Article Google Scholar
Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.
Article Google Scholar
Cole, C., Karam, M., & Aglan, H. (2008). Spectral subtraction of noise in speech processing applications. In 40th Southeastern symposium system theory (pp. 50–53), SSST-2008, 16–18 March.
Computational Auditory Scene Analysis (CASA). (2006). In D. Wang & G. Brown (Eds.), Principles, algorithms, and applications. Piscataway, NJ: Wiley/IEEE Press.
Diethorn, E. J. (2004). Subband noise reduction methods for speech enhancement. In Y. Huang & J. Benesty (Eds.), Audio signal processing for next-generation multimedia communication systems (pp. 91–115). Boston: Springer.
Chapter Google Scholar
Donoho, D. L., & Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425–455.
Article MathSciNet MATH Google Scholar
Donoho, D. L., & Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association, 90(432), 1200–1224.
Article MathSciNet MATH Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics Speech and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Transactions on Acoustics Speech and Signal processing, 33(2), 443–445.
Article Google Scholar
Etter, W., & Moschytz, G. S. (1994). Noise reduction by noise-adaptive spectral magnitude expansion. Journal of the Audio Engineering Society, 42, 341–349.
Google Scholar
Evans, N. W. D., Mason, J. S., Liu, W. M., & Fauve, B. (2005). On the fundamental limitations of spectral subtraction: An assessment by automatic speech recognition. In Signal processing conference, 2005 13th European, Antalya (pp. 1-4).
Faller, C., & Chen, J. (2005). Suppressing acoustic echo in a spectral envelope space. IEEE Transactions on Speech and Audio Processing, 13(5), 1048–1062.
Article Google Scholar
Gauvain, J. L., & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2(2), 291–299.
Article Google Scholar
Goodarzi, H. M., & Seyedtabaii, S. (2009). Speech enhancement using spectral subtraction based on a modified noise minimum statistics estimation. In Fifth joint international conference (pp. 1339–1343), August 25–27, 2009.
Hu, Y., & Loizou, P. (2006). Subjective comparison of speech enhancement algorithms. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, Toulouse (Vol. 1, pp. 153–156).
Hu, Y., & Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601.
Article Google Scholar
Hu, Y. & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Article Google Scholar
ITU. (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone net- works and speech codecs. ITU, ITU-T Rec.
Jansen, M. (2001). Noise reduction by wavelet thresholding. Series lecture notes in statistics (Vol. 161). Berlin: Springer.
Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, Orlando
Karam, M., Khazaal, H. F., Aglan, H., & Cole, C. (2014). Noise removal in speech processing using spectral subtraction. Journal of Signal and Information Processing, 5(2), 45989.
Article Google Scholar
Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio Speech and Language Processing, 18(8), 2080–2090.
Article Google Scholar
Kim, G., Lu, Y., Hu, Y., & Loizou, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America, 126(3), 1486–1494.
Article Google Scholar
Li, N., & Loizou, P. (2008). Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction. The Journal of the Acoustical Society of America, 123(3), 1673–1682.
Article Google Scholar
Liu, H., Yu, X., Wan, W., & Swaminathan, R. (2012). An improved spectral subtraction method. In International conference on audio, language and image processing (ICALIP), Shanghai (pp. 790–793).
Loizou, P. C. (2005). Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum. IEEE Transactions on Speech and Audio Processing, 13(5), 857–869.
Article Google Scholar
Loizou, P. (2007). Speech enhancement: Theory and practice (1st ed.). Boca Raton, FL: CRC Taylor & Francis.
Book Google Scholar
Lotter, T., & Vary, P. (2005). Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing, 5(1), 1110–1126.
MATH Google Scholar
Lu, Y., & Loizou, P. C. (2011). Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty. IEEE Transactions on Audio Speech and Language Processing, 19(5), 1123–1137.
Article Google Scholar
Mallat, S. (1999). A wavelet tour of signal processing. San Diego, CA: Academic.
MATH Google Scholar
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.
Article Google Scholar
Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Transactions on Speech and Audio Processing, 13(5), 845–856.
Article Google Scholar
McAulay, R., & Malpass, M. (1980). Speech enhancement using a soft-decision noise suppression filter. IEEE Transactions on Acoustics Speech and Signal Processing, 28(2), 137–145.
Article Google Scholar
Quackenbush, S., Barnwell, T., & Clements, M. (1988). Objective measures of speech quality. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River, NJ: Prentice- Hall Inc.
Google Scholar
Ramirez, J., Gorriz, J. M., Segura, J. C., et al. (2003). Voice activity detection. Fundamentals and speech recognition system robustness. Rijeka: InTech.
Google Scholar
Rix, A., Beerends, J., Hollier, M., & Hekstra, A. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, Istanbul (Vol. 2, pp. 749–752).
Sim, B. L., Tong, Y. C., Chang, J. S., & Tan, C. T. (1998). A parametric formulation of the generalized spectral subtraction method. IEEE Transactions on Speech and Audio Processing, 6(4), 328–337.
Article Google Scholar
Thimmaraja, Y. G., Jai Prakash, T. S., & Jayanna, H. S. (2015). Noise elimination in degraded Kannada speech signal for speech recognition. In IEEE proceedings of international conference on trends in automation, communication and computing technologies (ITACT-2015), Bangalore (pp. 183–186), December 21–22, 2015.
Wolfe, P. J., & Godsill, S. J. (2001). Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement. In Proceedings of the 11th IEEE signal processing workshop on statistics and signal processing, Singapore (pp. 496–499).
Xia, B., Liang, Y., & Bao, C. (2009). A modified spectral subtraction method for speech enhancement based on masking property of human auditory system. In International conference on wireless communications signal processing, WCSP, Nanjing (pp. 1–5).
Yadava, T. G., & Jayanna, H. S. (2017). A spoken query system for the agricultural commodity prices and weather information access in Kannada language. International Journal of Speech Technology, Springer, 20(3), 635–644.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Siddaganga Institute of Technology, Tumkur, Karnataka, India
Thimmaraja G. Yadava
Department of Information Science and Engineering, Siddaganga Institute of Technology, Tumkur, Karnataka, India
H. S. Jayanna

Authors

Thimmaraja G. Yadava
View author publications
You can also search for this author in PubMed Google Scholar
H. S. Jayanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thimmaraja G. Yadava.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yadava, T.G., Jayanna, H.S. Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. Int J Speech Technol 22, 639–648 (2019). https://doi.org/10.1007/s10772-018-9506-9

Download citation

Received: 16 August 2017
Accepted: 29 March 2018
Published: 10 April 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10772-018-9506-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Speech Emotion Recognition: A Comprehensive Survey

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation