An Improved Voice Activity Detection Algorithm for GSM Adaptive Multi-Rate Speech Codec Based on Wavelet and Support Vector Machine

Chen, Shi-Huang; Chang, Yaotsu; Truong, T. K.

doi:10.1007/978-3-540-73325-6_91

Shi-Huang Chen¹,
Yaotsu Chang² &
T. K. Truong²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4570))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1073 Accesses
2 Citations

Abstract

This paper proposes an improved voice activity detection (VAD) algorithm for controlling discontinuous transmission (DTX) of the GSM adaptive multi-rate (AMR) speech codec. First, based on the wavelet transform, the original IIR filter bank and the open-loop pitch detector are implemented via the wavelet filter bank and the wavelet-based pitch detection algorithm, respectively. The proposed wavelet filter bank divides the input speech signal into 9 frequency bands so that the signal level at each sub-band can be calculated. In addition, the background noise can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then one can apply support vector machine (SVM) to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database show that the proposed algorithm gives considerable VAD performances superior to the AMR VAD Option 1 and comparable with the AMR VAD Option 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

3GPP TS 26.094 V6.1.0, Voice Activity Detector (VAD) for Adaptive Multi-Rate speech codec (2006)
Google Scholar
Ramírez, J., Segura, J.C., Benítez, C., de la Torre, Á., Rubio, A.J.: A New Kullback-Leibler VAD for Speech Recognition in Noise. IEEE Signal Processing letters 11(2), 266–269 (2004)
Article Google Scholar
Garner, N.R., Barrett, P.A., Howard, D.M., Tyrrell, A.M.: Robust noise detection for speech detection and enhancement. Electron. Lett. 33(4), 270–271 (1997)
Article Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Lin, C.-C., Chen, S.-H., Truong, T.K., Chang, Y.: Audio Classification and Categorization Based on Wavelets and Support Vector Machine. IEEE Trans. on Speech and Audio Processing 13(5), 644–651 (2005)
Article Google Scholar
Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90, 1200–1224 (1995)
Article MATH Google Scholar
Stein, C.: Estimation of the mean of a multivariate normal distribution. Annals of Statistics 9(6), 1135–1151 (1981)
MATH Google Scholar
Mallat, S.: Multifrequency channel decomposition of images and wavelet model. IEEE Trans. Acoustic, Speech and Signal Processing 68, 2091–2110 (1980)
Google Scholar
Chen, S.-H., Wang, J.-F.: Noise-robust pitch detection method using wavelet transform with aliasing compensation. IEE Proc. Vision, Image and Signal Processing 149(6), 327–334 (2002)
Article Google Scholar
Hu, Y., Loizou, P.C.: Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. on Speech and Audio Processing 12(1), 59–67 (2004)
Article Google Scholar
Aurora 2 Database (2000), http://www.elda.org/article52.html

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Shu-Te University, Kaohsiung County, 824, Taiwan, R.O.C
Shi-Huang Chen
Department of Information Engineering, I-Shou University, Kaohsiung County, 840, Taiwan, R.O.C
Yaotsu Chang & T. K. Truong

Authors

Shi-Huang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yaotsu Chang
View author publications
You can also search for this author in PubMed Google Scholar
T. K. Truong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hiroshi G. Okuno Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, SH., Chang, Y., Truong, T.K. (2007). An Improved Voice Activity Detection Algorithm for GSM Adaptive Multi-Rate Speech Codec Based on Wavelet and Support Vector Machine. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_91

Download citation

DOI: https://doi.org/10.1007/978-3-540-73325-6_91
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73322-5
Online ISBN: 978-3-540-73325-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics