Bispectra Analysis-Based VAD for Robust Speech Recognition

Górriz, J. M.; Puntonet, C. G.; Ramírez, J.; Segura, J. C.

doi:10.1007/11499305_58

J. M. Górriz¹⁸,
C. G. Puntonet¹⁸,
J. Ramírez¹⁸ &
…
J. C. Segura¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3562))

Included in the following conference series:

International Work-Conference on the Interplay Between Natural and Artificial Computation

2095 Accesses
1 Citations

Abstract

A robust and effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formulated (detection tests) and the domain used in this approach. Clear improvements in speech/non-speech discrimination accuracy demonstrate the effectiveness of the proposed VAD. It is shown that application of statistical detection test leads to a better separation of the speech and noise distributions, thus allowing a more effective discrimination and a tradeoff between complexity and performance. The algorithm also incorporates a previous noise reduction block improving the accuracy in detecting speech and non-speech. The experimental analysis carried out on the AURORA databases and tasks provides an extensive performance evaluation together with an exhaustive comparison to the standard VADs such as ITU G.729, GSM AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Karray, L., Martin, A.: Towards improving speech detection robustness for speech recognition in adverse environments. Speech Communitation (3), 261–276 (2003)
Google Scholar
ETSI, “Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels,” ETSI EN 301 708 Recommendation (1999)
Google Scholar
ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. ITU-T Recommendation G.729-Annex B (1996)
Google Scholar
Sangwan, A., Chiranth, M.C., Jamadagni, H.S., Sah, R., Prasad, R.V., Gaurav, V.: VAD techniques for real-time speech transmission on the Internet. In: IEEE International Conference on High-Speed Networks and Multimedia Communications, pp. 46–50 (2002)
Google Scholar
Gustafsson, S., Martin, R., Jax, P., Vary, P.: A psychoacoustic approach to combined acoustic echo cancellation and noise reduction. IEEE Transactions on Speech and Audio Processing 10(5), 245–256 (2002)
Article Google Scholar
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)
Article Google Scholar
Cho, Y.D., Kondoz, A.: Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Processing Letters 8(10), 276–278 (2001)
Article Google Scholar
Bouquin-Jeannes, R.L., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)
Article Google Scholar
Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36(2), 180–181 (2000)
Article Google Scholar
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10(3), 146–157 (2002)
Article Google Scholar
Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10(6), 341–351 (2002)
Article Google Scholar
Chengalvarayan, R.: Robust energy normalization using speech/non-speech discriminator for German connected digit recognition. In: Proc. of EUROSPEECH 1999, Budapest, Hungary, September 1999, pp. 61–64 (1999)
Google Scholar
Tucker, R.: Voice activity detection using a periodicity measure. IEE Proceedings, Communications, Speech and Vision 139(4), 377–380 (1992)
Article Google Scholar
Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higher-order statistics in the lpc residual domain. IEEE Trans. Speech and Audio Processing 9(3), 217–231 (2001)
Article Google Scholar
Nikias, C., Petropulu, A.: Higher Order Spectra Analysis: a Nonlinear Signal Processing Framework. Prentice-Hall, Englewood Cliffs (1993)
MATH Google Scholar
Brillinger, D., Rossenblatt, M.: Spectral Analysis of Time Series, ch. Asymptotic theory of estimates of kth order spectra. Wiley, Chichester (1975)
Google Scholar
Subba-Rao, T.: A test for linearity of stationary time series. Journal of Time Series Analisys 1, 145–158 (1982)
MathSciNet Google Scholar
Hinich, J.: Testing for gaussianity and linearity of a stationary time series. Journal of Time Series Analisys 3, 169–176 (1982)
Article MATH MathSciNet Google Scholar
Tugnait, J.: Two channel tests fro common non-gaussian signal detection. IEE Proceedings-F 140, 343–349 (1993)
Google Scholar
Ramírez, J., Segura, J., Benítez, C., delaTorre, A., Rubio, A.: An effective subband osf-based vad with noise reduction for robust speech recognition. IEEE Transactions on Speech and Audio Processing (2004) (in press)
Google Scholar
Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)
Google Scholar
Benyassine, A., Shlomot, E., Su, H., Massaloux, D., Lamblin, C., Petit, J.: ITUT Recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. IEEE Communications Magazine 35(9), 64–73 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

E.T.S.I.I., Universidad de Granada, C/Periodista Daniel Saucedo, 18071, Granada, Spain
J. M. Górriz, C. G. Puntonet, J. Ramírez & J. C. Segura

Authors

J. M. Górriz
View author publications
You can also search for this author in PubMed Google Scholar
C. G. Puntonet
View author publications
You can also search for this author in PubMed Google Scholar
J. Ramírez
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Segura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

E.T.S.I. Informática, Universidad Nacional de Educación a Distancia, 28040, Madrid, Spain
José Mira
E.T.S. de Ingeniería Informática, Departamento de Intelifencia Artificial, Universidad Nacional de Educación a Distancia, Juan del Rosal, 16, 28040, Madrid, Spain
José R. Álvarez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Górriz, J.M., Puntonet, C.G., Ramírez, J., Segura, J.C. (2005). Bispectra Analysis-Based VAD for Robust Speech Recognition. In: Mira, J., Álvarez, J.R. (eds) Artificial Intelligence and Knowledge Engineering Applications: A Bioinspired Approach. IWINAC 2005. Lecture Notes in Computer Science, vol 3562. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11499305_58

Download citation

DOI: https://doi.org/10.1007/11499305_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26319-7
Online ISBN: 978-3-540-31673-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics