Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3562))

Abstract

A robust and effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formulated (detection tests) and the domain used in this approach. Clear improvements in speech/non-speech discrimination accuracy demonstrate the effectiveness of the proposed VAD. It is shown that application of statistical detection test leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The algorithm also incorporates a previous noise reduction block improving the accuracy in detecting speech and non-speech. The experimental analysis carried out on the AURORA databases and tasks provides an extensive performance evaluation together with an exhaustive comparison to the standard VADs such as ITU G.729, GSM AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environments. Speech Communitation (3), 261–276 (2003)

    Google Scholar 

  2. ETSI, “Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels,” ETSI EN 301 708 Recommendation (1999)

    Google Scholar 

  3. ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. ITU-T Recommendation G.729-Annex B (1996)

    Google Scholar 

  4. Sangwan, A., Chiranth, M.C., Jamadagni, H.S., Sah, R., Prasad, R.V., Gaurav, V.: VAD techniques for real-time speech transmission on the Internet. In: IEEE International Conference on High-Speed Networks and Multimedia Communications, pp. 46–50 (2002)

    Google Scholar 

  5. Gustafsson, S., Martin, R., Jax, P., Vary, P.: A psychoacoustic approach to combined acoustic echo cancellation and noise reduction. IEEE Transactions on Speech and Audio Processing 10(5), 245–256 (2002)

    Article  Google Scholar 

  6. Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)

    Article  Google Scholar 

  7. Cho, Y.D., Kondoz, A.: Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Processing Letters 8(10), 276–278 (2001)

    Article  Google Scholar 

  8. Bouquin-Jeannes, R.L., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)

    Article  Google Scholar 

  9. Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)

    Article  Google Scholar 

  10. Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3), 146–157 (2002)

    Article  Google Scholar 

  11. Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(6), 341–351 (2002)

    Article  Google Scholar 

  12. Chengalvarayan, R.: Robust energy normalization using speech/non-speech discriminator for German connected digit recognition. In: Proc. of EUROSPEECH 1999, Budapest, Hungary, September 1999, pp. 61–64 (1999)

    Google Scholar 

  13. Tucker, R.: Voice activity detection using a periodicity measure. IEE Proceedings, Communications, Speech and Vision 139(4), 377–380 (1992)

    Article  Google Scholar 

  14. Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higher-order statistics in the lpc residual domain. IEEE Trans. Speech and Audio Processing 9(3), 217–231 (2001)

    Article  Google Scholar 

  15. Nikias, C., Petropulu, A.: Higher Order Spectra Analysis: a Nonlinear Signal Processing Framework. Prentice-Hall, Englewood Cliffs (1993)

    MATH  Google Scholar 

  16. Brillinger, D., Rossenblatt, M.: Spectral Analysis of Time Series, ch. Asymptotic theory of estimates of kth order spectra. Wiley, Chichester (1975)

    Google Scholar 

  17. Subba-Rao, T.: A test for linearity of stationary time series. Journal of Time Series Analisys 1, 145–158 (1982)

    MathSciNet  Google Scholar 

  18. Hinich, J.: Testing for gaussianity and linearity of a stationary time series. Journal of Time Series Analisys 3, 169–176 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  19. Tugnait, J.: Two channel tests fro common non-gaussian signal detection. IEE Proceedings-F 140, 343–349 (1993)

    Google Scholar 

  20. Ramírez, J., Segura, J., Benítez, C., delaTorre, A., Rubio, A.: An effective subband osf-based vad with noise reduction for robust speech recognition. IEEE Transactions on Speech and Audio Processing (2004) (in press)

    Google Scholar 

  21. Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)

    Google Scholar 

  22. Benyassine, A., Shlomot, E., Su, H., Massaloux, D., Lamblin, C., Petit, J.: ITUT Recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. IEEE Communications Magazine 35(9), 64–73 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Górriz, J.M., Puntonet, C.G., Ramírez, J., Segura, J.C. (2005). Bispectra Analysis-Based VAD for Robust Speech Recognition. In: Mira, J., Álvarez, J.R. (eds) Artificial Intelligence and Knowledge Engineering Applications: A Bioinspired Approach. IWINAC 2005. Lecture Notes in Computer Science, vol 3562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11499305_58

Download citation

  • DOI: https://doi.org/10.1007/11499305_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26319-7

  • Online ISBN: 978-3-540-31673-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics