Skip to main content

An Improved Voice Activity Detection Algorithm for GSM Adaptive Multi-Rate Speech Codec Based on Wavelet and Support Vector Machine

  • Conference paper
New Trends in Applied Artificial Intelligence (IEA/AIE 2007)

Abstract

This paper proposes an improved voice activity detection (VAD) algorithm for controlling discontinuous transmission (DTX) of the GSM adaptive multi-rate (AMR) speech codec. First, based on the wavelet transform, the original IIR filter bank and the open-loop pitch detector are implemented via the wavelet filter bank and the wavelet-based pitch detection algorithm, respectively. The proposed wavelet filter bank divides the input speech signal into 9 frequency bands so that the signal level at each sub-band can be calculated. In addition, the background noise can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then one can apply support vector machine (SVM) to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database show that the proposed algorithm gives considerable VAD performances superior to the AMR VAD Option 1 and comparable with the AMR VAD Option 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 3GPP TS 26.094 V6.1.0, Voice Activity Detector (VAD) for Adaptive Multi-Rate speech codec (2006)

    Google Scholar 

  2. Ramírez, J., Segura, J.C., Benítez, C., de la Torre, Á., Rubio, A.J.: A New Kullback-Leibler VAD for Speech Recognition in Noise. IEEE Signal Processing letters 11(2), 266–269 (2004)

    Article  Google Scholar 

  3. Garner, N.R., Barrett, P.A., Howard, D.M., Tyrrell, A.M.: Robust noise detection for speech detection and enhancement. Electron. Lett. 33(4), 270–271 (1997)

    Article  Google Scholar 

  4. Vapnik, V.N.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  5. Lin, C.-C., Chen, S.-H., Truong, T.K., Chang, Y.: Audio Classification and Categorization Based on Wavelets and Support Vector Machine. IEEE Trans. on Speech and Audio Processing 13(5), 644–651 (2005)

    Article  Google Scholar 

  6. Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90, 1200–1224 (1995)

    Article  MATH  Google Scholar 

  7. Stein, C.: Estimation of the mean of a multivariate normal distribution. Annals of Statistics 9(6), 1135–1151 (1981)

    MATH  Google Scholar 

  8. Mallat, S.: Multifrequency channel decomposition of images and wavelet model. IEEE Trans. Acoustic, Speech and Signal Processing 68, 2091–2110 (1980)

    Google Scholar 

  9. Chen, S.-H., Wang, J.-F.: Noise-robust pitch detection method using wavelet transform with aliasing compensation. IEE Proc. Vision, Image and Signal Processing 149(6), 327–334 (2002)

    Article  Google Scholar 

  10. Hu, Y., Loizou, P.C.: Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. on Speech and Audio Processing 12(1), 59–67 (2004)

    Article  Google Scholar 

  11. Aurora 2 Database (2000), http://www.elda.org/article52.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hiroshi G. Okuno Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Chen, SH., Chang, Y., Truong, T.K. (2007). An Improved Voice Activity Detection Algorithm for GSM Adaptive Multi-Rate Speech Codec Based on Wavelet and Support Vector Machine. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_91

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73325-6_91

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73322-5

  • Online ISBN: 978-3-540-73325-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics