Abstract
In this paper voice activity detection (VAD) is formulated as a two-class classification problem using support vector machines (SVM). The proposed method combines a noise robust speech processing feature extraction process together with SVM models trained in different background noises for speech/non-speech classification. A multi-class SVM is also used to classify background noises in order to select SVM model for VAD. The proposed VAD is tested with TIMIT data artificially distorted by different additive noise types and is compared with state-of-the-art VADs. Experimental results show that the proposed VAD can extract speech activity under poor SNR conditions, and it is also insensitive to variable levels of noise.
Similar content being viewed by others
References
Beritelli, F., Casale, S., Ruggeri, G.: Performance evaluation and comparison of ITU-T/ETSI voice activity detectors. In: Proceedings ICASSP, pp. 1425–1428 (2001)
Srinivasant, K., Gersho, A.: Voice activity detection for cellular networks. In: Proceedings IEEE Speech Coding, Workshop, pp. 85–86 (1993)
Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environment. Speech Commun. 40, 261–276 (2003)
Woo, K.H., Yang, T.Y., Park, K.J., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. IEE Electron. Lett. 36(2), 180–181 (2000)
Chen, S.H., Wu, H.T., Chang, Y., Truong, T.K.: Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator. Pattern Recognit. Lett. 28(11), 1327–1332 (2007)
Ramírez, J., Segura, J.C., Benítez, M.C., Torre, Á.D., Rubio, A.J.: An effective subband OSF-based VAD with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process. 13(6), 1119–1129 (2005)
Wu, B.F., Wang, K.C.: Voice activity detection based on auto-correlation function using wavelet transform and Teager energy operator. Comput. Linguist. Chin. Lang. Process. 11(1), 87–100 (2006)
Thatphithakkul, N., Kruatrachue, B., Wutiwiwatchai, C., Marukatat, S.: Robust speech recognition using PCA-Based noise classification. In: SPECCOM, pp. 45–53 (2005)
Mohammadi, M., Zamani, B., Nasersharif, B., Rahmani, M., Akbari, A.: A wavelet based speech enhancement method using noise classification and shaping. In: INTERSPEECH, pp. 561–564 (2008)
Coifman, R.R., Wickerhauser, M.V.: Entropy-based algorithms for best basis selection. IEEE Trans. Inf. Theory 38(2), 713–718 (1992)
Rabiner, L., Juang, B.H.: Fundamental of Speech Recognition. Prentice-Hall, Upper Saddle River (1993)
Zwicker, E., Terhardt, E.: Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am. 68, 1523–1525 (1980)
Varga, A.P., Steeneken, H.J.M., Tomlinson, M., Jones, D.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition, http://spib.rice.edu/spib/select (1992)
Friedman, J.H.: Another Approach to Polychotomous Classification. Technical Report. Department of Statistics, Stanford University, pp. 1–14 (1996)
Chang, C., Lin, C.J.: LIBSVM: A Library for support Vector Machines, Technical Report. Department of Computer Science and Information Engineering, National Taiwan University (2001)
Welch, P.D.: The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 15, 70–73 (1967)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT Acoustic- Phonetic—Continuous Speech Corpus. Technical Report. National Institute of Standards and Technology (1993)
Ramýrez, J., Yélamos, P., Górriz, J., Segura, J., Garcýa, L.: Speech/non-speech discrimination combining advanced feature extraction and SVM learning. In: Ninth International Conference on Spoken Language Processing, pp. 1662–1665 (2006)
Kinnunen, T., Chernenko, E., Tuononen, M., Frnti, P., Li, H.: Voice activity detection using MFCC features and support vector machine. Int. Conf. Speech Comput. 2, 556–561 (2007)
Sohn, J., Kim, N.S., Sung, W.: A statistical model based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
Ramirez, J., Segura, J., Benitez, C., De La Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42(3–4), 271–287 (2004)
Madisetti, V., Williams, D.B.: Digital Signal Processing Handbook. CRC/IEEE Press, Boca Raton (1999)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saeedi, J., Ahadi, S.M. & Faez, K. Robust voice activity detection directed by noise classification. SIViP 9, 561–572 (2015). https://doi.org/10.1007/s11760-013-0479-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-013-0479-5