Robust voice activity detection directed by noise classification

Saeedi, Jamal; Ahadi, Seyed Mohammad; Faez, Karim

doi:10.1007/s11760-013-0479-5

Robust voice activity detection directed by noise classification

Original Paper
Published: 30 April 2013

Volume 9, pages 561–572, (2015)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jamal Saeedi¹,
Seyed Mohammad Ahadi¹ &
Karim Faez¹

587 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper voice activity detection (VAD) is formulated as a two-class classification problem using support vector machines (SVM). The proposed method combines a noise robust speech processing feature extraction process together with SVM models trained in different background noises for speech/non-speech classification. A multi-class SVM is also used to classify background noises in order to select SVM model for VAD. The proposed VAD is tested with TIMIT data artificially distorted by different additive noise types and is compared with state-of-the-art VADs. Experimental results show that the proposed VAD can extract speech activity under poor SNR conditions, and it is also insensitive to variable levels of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Beritelli, F., Casale, S., Ruggeri, G.: Performance evaluation and comparison of ITU-T/ETSI voice activity detectors. In: Proceedings ICASSP, pp. 1425–1428 (2001)
Srinivasant, K., Gersho, A.: Voice activity detection for cellular networks. In: Proceedings IEEE Speech Coding, Workshop, pp. 85–86 (1993)
Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environment. Speech Commun. 40, 261–276 (2003)
Article Google Scholar
Woo, K.H., Yang, T.Y., Park, K.J., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. IEE Electron. Lett. 36(2), 180–181 (2000)
Article Google Scholar
Chen, S.H., Wu, H.T., Chang, Y., Truong, T.K.: Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator. Pattern Recognit. Lett. 28(11), 1327–1332 (2007)
Google Scholar
Ramírez, J., Segura, J.C., Benítez, M.C., Torre, Á.D., Rubio, A.J.: An effective subband OSF-based VAD with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process. 13(6), 1119–1129 (2005)
Article Google Scholar
Wu, B.F., Wang, K.C.: Voice activity detection based on auto-correlation function using wavelet transform and Teager energy operator. Comput. Linguist. Chin. Lang. Process. 11(1), 87–100 (2006)
Google Scholar
Thatphithakkul, N., Kruatrachue, B., Wutiwiwatchai, C., Marukatat, S.: Robust speech recognition using PCA-Based noise classification. In: SPECCOM, pp. 45–53 (2005)
Mohammadi, M., Zamani, B., Nasersharif, B., Rahmani, M., Akbari, A.: A wavelet based speech enhancement method using noise classification and shaping. In: INTERSPEECH, pp. 561–564 (2008)
Coifman, R.R., Wickerhauser, M.V.: Entropy-based algorithms for best basis selection. IEEE Trans. Inf. Theory 38(2), 713–718 (1992)
Article MATH Google Scholar
Rabiner, L., Juang, B.H.: Fundamental of Speech Recognition. Prentice-Hall, Upper Saddle River (1993)
Google Scholar
Zwicker, E., Terhardt, E.: Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am. 68, 1523–1525 (1980)
Article Google Scholar
Varga, A.P., Steeneken, H.J.M., Tomlinson, M., Jones, D.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition, http://spib.rice.edu/spib/select (1992)
Friedman, J.H.: Another Approach to Polychotomous Classification. Technical Report. Department of Statistics, Stanford University, pp. 1–14 (1996)
Chang, C., Lin, C.J.: LIBSVM: A Library for support Vector Machines, Technical Report. Department of Computer Science and Information Engineering, National Taiwan University (2001)
Welch, P.D.: The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 15, 70–73 (1967)
Article MathSciNet Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT Acoustic- Phonetic—Continuous Speech Corpus. Technical Report. National Institute of Standards and Technology (1993)
Ramýrez, J., Yélamos, P., Górriz, J., Segura, J., Garcýa, L.: Speech/non-speech discrimination combining advanced feature extraction and SVM learning. In: Ninth International Conference on Spoken Language Processing, pp. 1662–1665 (2006)
Kinnunen, T., Chernenko, E., Tuononen, M., Frnti, P., Li, H.: Voice activity detection using MFCC features and support vector machine. Int. Conf. Speech Comput. 2, 556–561 (2007)
Google Scholar
Sohn, J., Kim, N.S., Sung, W.: A statistical model based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
Article Google Scholar
Ramirez, J., Segura, J., Benitez, C., De La Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42(3–4), 271–287 (2004)
Google Scholar
Madisetti, V., Williams, D.B.: Digital Signal Processing Handbook. CRC/IEEE Press, Boca Raton (1999)
Google Scholar
http://www.mathworks.com/matlabcentral/fileexchange/39343

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, Amirkabir University of Technology, 424 Hafez Ave., Tehran, Iran
Jamal Saeedi, Seyed Mohammad Ahadi & Karim Faez

Authors

Jamal Saeedi
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mohammad Ahadi
View author publications
You can also search for this author in PubMed Google Scholar
Karim Faez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jamal Saeedi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saeedi, J., Ahadi, S.M. & Faez, K. Robust voice activity detection directed by noise classification. SIViP 9, 561–572 (2015). https://doi.org/10.1007/s11760-013-0479-5

Download citation

Received: 12 June 2012
Revised: 14 April 2013
Accepted: 14 April 2013
Published: 30 April 2013
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11760-013-0479-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust voice activity detection directed by noise classification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust voice activity detection directed by noise classification

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation