Skip to main content
Log in

Robust voice activity detection directed by noise classification

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In this paper voice activity detection (VAD) is formulated as a two-class classification problem using support vector machines (SVM). The proposed method combines a noise robust speech processing feature extraction process together with SVM models trained in different background noises for speech/non-speech classification. A multi-class SVM is also used to classify background noises in order to select SVM model for VAD. The proposed VAD is tested with TIMIT data artificially distorted by different additive noise types and is compared with state-of-the-art VADs. Experimental results show that the proposed VAD can extract speech activity under poor SNR conditions, and it is also insensitive to variable levels of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Beritelli, F., Casale, S., Ruggeri, G.: Performance evaluation and comparison of ITU-T/ETSI voice activity detectors. In: Proceedings ICASSP, pp. 1425–1428 (2001)

  2. Srinivasant, K., Gersho, A.: Voice activity detection for cellular networks. In: Proceedings IEEE Speech Coding, Workshop, pp. 85–86 (1993)

  3. Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environment. Speech Commun. 40, 261–276 (2003)

    Article  Google Scholar 

  4. Woo, K.H., Yang, T.Y., Park, K.J., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. IEE Electron. Lett. 36(2), 180–181 (2000)

    Article  Google Scholar 

  5. Chen, S.H., Wu, H.T., Chang, Y., Truong, T.K.: Robust voice activity detection using perceptual wavelet-packet transform and Teager energy operator. Pattern Recognit. Lett. 28(11), 1327–1332 (2007)

    Google Scholar 

  6. Ramírez, J., Segura, J.C., Benítez, M.C., Torre, Á.D., Rubio, A.J.: An effective subband OSF-based VAD with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process. 13(6), 1119–1129 (2005)

    Article  Google Scholar 

  7. Wu, B.F., Wang, K.C.: Voice activity detection based on auto-correlation function using wavelet transform and Teager energy operator. Comput. Linguist. Chin. Lang. Process. 11(1), 87–100 (2006)

    Google Scholar 

  8. Thatphithakkul, N., Kruatrachue, B., Wutiwiwatchai, C., Marukatat, S.: Robust speech recognition using PCA-Based noise classification. In: SPECCOM, pp. 45–53 (2005)

  9. Mohammadi, M., Zamani, B., Nasersharif, B., Rahmani, M., Akbari, A.: A wavelet based speech enhancement method using noise classification and shaping. In: INTERSPEECH, pp. 561–564 (2008)

  10. Coifman, R.R., Wickerhauser, M.V.: Entropy-based algorithms for best basis selection. IEEE Trans. Inf. Theory 38(2), 713–718 (1992)

    Article  MATH  Google Scholar 

  11. Rabiner, L., Juang, B.H.: Fundamental of Speech Recognition. Prentice-Hall, Upper Saddle River (1993)

    Google Scholar 

  12. Zwicker, E., Terhardt, E.: Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am. 68, 1523–1525 (1980)

    Article  Google Scholar 

  13. Varga, A.P., Steeneken, H.J.M., Tomlinson, M., Jones, D.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition, http://spib.rice.edu/spib/select (1992)

  14. Friedman, J.H.: Another Approach to Polychotomous Classification. Technical Report. Department of Statistics, Stanford University, pp. 1–14 (1996)

  15. Chang, C., Lin, C.J.: LIBSVM: A Library for support Vector Machines, Technical Report. Department of Computer Science and Information Engineering, National Taiwan University (2001)

  16. Welch, P.D.: The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 15, 70–73 (1967)

    Article  MathSciNet  Google Scholar 

  17. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT Acoustic- Phonetic—Continuous Speech Corpus. Technical Report. National Institute of Standards and Technology (1993)

  18. Ramýrez, J., Yélamos, P., Górriz, J., Segura, J., Garcýa, L.: Speech/non-speech discrimination combining advanced feature extraction and SVM learning. In: Ninth International Conference on Spoken Language Processing, pp. 1662–1665 (2006)

  19. Kinnunen, T., Chernenko, E., Tuononen, M., Frnti, P., Li, H.: Voice activity detection using MFCC features and support vector machine. Int. Conf. Speech Comput. 2, 556–561 (2007)

    Google Scholar 

  20. Sohn, J., Kim, N.S., Sung, W.: A statistical model based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  Google Scholar 

  21. Ramirez, J., Segura, J., Benitez, C., De La Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42(3–4), 271–287 (2004)

    Google Scholar 

  22. Madisetti, V., Williams, D.B.: Digital Signal Processing Handbook. CRC/IEEE Press, Boca Raton (1999)

    Google Scholar 

  23. http://www.mathworks.com/matlabcentral/fileexchange/39343

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jamal Saeedi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saeedi, J., Ahadi, S.M. & Faez, K. Robust voice activity detection directed by noise classification. SIViP 9, 561–572 (2015). https://doi.org/10.1007/s11760-013-0479-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-013-0479-5

Keywords

Navigation