Reference free speech quality estimation for diverse data condition

Shome, Nirupam; Laskar, R. H.; Das, Debaprasad

doi:10.1007/s10772-018-9537-2

Reference free speech quality estimation for diverse data condition

Published: 20 August 2018

Volume 22, pages 585–599, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Nirupam Shome¹,
R. H. Laskar² &
Debaprasad Das¹

189 Accesses
7 Citations
Explore all metrics

Abstract

The performance of any speech based systems depends on the quality of input speech signals. The signal to noise ratio (SNR) is considered to be a measure of the quality of speech signal. This paper reports some analyses (based on experimental evaluations) that are focused on calculating the quality of the speech signal so as to improve the overall accuracy of the system. As compared to existing methods that are based on voice activity detection (VAD), the proposed method is based on glottal activity detection (GAD) to detect the speech and non-speech regions from the input speech signals. Literature reveals that the GAD provides better results than VAD under noisy data condition for the above mentioned task. The proposed method uses a filter with specified cut-off frequency to separate the noise components that are present in the speech activity region. After that, we calculate the SNR as the ratio of the total energy in the speech activity region to that of the non-speech regions. The comparative analyses with two state-of-the-art techniques, viz, National Institute of Standards and Technology (NIST) SNR tool and Waveform Amplitude Distribution Analysis (WADA) SNR algorithm suggest that under different data conditions the proposed method outperforms the existing ones. Since the proposed model depends on the signal processing techniques and does not require any classifier or model to be developed, therefore it is found to be computationally efficient as compared to other two methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Breithaupt, C., Gerkmann, T., & Martin, R. (2008). A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In International Conference on Acoustics, Speech and Signal Processing. (pp. 4897–4900). IEEE.
Cohen, I. (2005a). Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 870–881.
Article Google Scholar
Cohen, I. (2005b). Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation. Speech Communication, 47(3), 336–350.
Article Google Scholar
Elshamy, S., Madhu, N., Tirry, W., & Fingscheidt, T. (2015). An iterative speech model-based a priori SNR estimator. In Sixteenth Annual Conference of the International Speech Communication Association.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Article Google Scholar
Fodor, B., & Fingscheidt, T. (2012). Reference-free SNR measurement for narrowband and wideband speech signals in car noise. In Proceedings of Speech Communication: ITG Symposium (pp. 1–4). VDE.
Furui, S. (1989). Digital speech processing synthesis, and recognition. New York: Marcel Dekker.
Google Scholar
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). Timit acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Google Scholar
Gerkmann, T., Breithaupt, C., & Martin, R. (2008). Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 910–919.
Article Google Scholar
Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In Fifth International Conference on Spoken Language Processing.
Hirsch, H. G., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In International Conference on Acoustics, Speech, and Signal Processing ICASSP-95. (Vol. 1, pp. 153–156). IEEE.
Hu, G., & Wang, D. (2008). Segregation of unvoiced speech from nonspeech interference. The Journal of the Acoustical Society of America, 124(2), 1306–1319.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.
Article Google Scholar
Kim, C., & Stern, R. M. (2008). Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In Ninth Annual Conference of the International Speech Communication Association.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing, 2005(7), 354850.
Article MATH Google Scholar
Manam, A. B., Revanth, T. S., Das, R. K., & Prasanna, S. M. (2016). Speaker verification using acoustic factor analysis with phonetic content compensation in limited and degraded test conditions. In Region 10 Conference (TENCON), 2016 IEEE (pp. 1402–1406). IEEE.
Martin, R. (1993). An efficient algorithm to estimate the instantaneous SNR of speech signals. In Third European Conference on Speech Communication and Technology.
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.
Article Google Scholar
Moazzeni, T., Amei, A., Ma, J., & Jiang, Y. (2012). Statistical model based SNR estimation method for speech signals. Electronics Letters, 48(12), 727–729.
Article Google Scholar
Morales-Cordovilla, J. A., Ma, N., Sánchez, V., Carmona, J. L., Peinado, A. M., & Barker, J. (2011). A pitch based noise estimation technique for robust speech recognition with missing data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4808–4811). IEEE.
Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Article Google Scholar
Narayanan, A., & Wang, D. (2012). A CASA-based system for long-term SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 20(9), 2518–2527.
Article Google Scholar
Paez, M., & Glisson, T. (1972). Minimum mean-squared-error quantization in speech PCM and DPCM systems. IEEE Transactions on Communications, 20(2), 225–230.
Article Google Scholar
Papadopoulos, P., Tsiartas, A., & Narayanan, S. (2016). Long-term SNR estimation of speech signals in known and unknown channel conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2495–2506.
Article Google Scholar
Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2098–2108.
Article Google Scholar
Pollak, P., & Vondrasek, M. (2005). Methods for speech SNR estimation&58: Evaluation tool and analysis of VAD dependency. Radioengineering, 14(1), 6–11.
Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Upper Saddle River: Prentice Hall.
Google Scholar
Ren, Y., & Johnson, M. T. (2008). An improved SNR estimator for speech enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP. (pp. 4901–4904). IEEE.
Saha, P., Baruah, U., Laskar, R. H., Mishra, S., Choudhury, S. P., & Das, T. K. (2016). Robust analysis for improvement of vowel onset point detection under noisy conditions. International Journal of Speech Technology, 19(3), 433–448.
Article Google Scholar
Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 2, pp. 629–632). IEEE.
Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.
Article Google Scholar
Suhadi, S., Last, C., & Fingscheidt, T. (2011). A data-driven approach to a priori SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 186–195.
Article Google Scholar
Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing, 11(3), 184–192.
Article Google Scholar
The NIST Speech, SNR Measurement [Online]. http://www.nist.gov/smartspace/nistspeechsnrmeasurement.html.
Varga, A., Steenneken, H. J. M., Tomlinson, M., & Jones, D. (1992). The NOISEX-92 study on the effect of additive noise on automatic speech recognition, 1992. Documentation included in the NOISEX-92 CD-ROMs.
Wang, D. (2005). Speech separation by humans and machines (pp. 181–197). Boston: Springer.
Book Google Scholar
Zhao, X., Shao, Y., & Wang, D. (2011). Robust speaker identification using a CASA front-end. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 5468–5471). IEEE

Download references

Author information

Authors and Affiliations

Department of Electronics & Communication Engineering, Assam University, Silchar, Assam, India
Nirupam Shome & Debaprasad Das
Department of Electronics & Communication Engineering, NIT, Silchar, Assam, India
R. H. Laskar

Authors

Nirupam Shome
View author publications
You can also search for this author in PubMed Google Scholar
R. H. Laskar
View author publications
You can also search for this author in PubMed Google Scholar
Debaprasad Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nirupam Shome.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shome, N., Laskar, R.H. & Das, D. Reference free speech quality estimation for diverse data condition. Int J Speech Technol 22, 585–599 (2019). https://doi.org/10.1007/s10772-018-9537-2

Download citation

Received: 27 February 2018
Accepted: 13 July 2018
Published: 20 August 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10772-018-9537-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reference free speech quality estimation for diverse data condition

Abstract

Access this article

Similar content being viewed by others

Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

Variable Quantile Level Based Noise Suppression for Robust Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reference free speech quality estimation for diverse data condition

Abstract

Access this article

Similar content being viewed by others

Non-negative Frequency-Weighted Energy-Based Speech Quality Estimation for Different Modes and Quality of Speech

Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments

Variable Quantile Level Based Noise Suppression for Robust Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation