Abstract
This paper proposes an improved voice activity detection (VAD) algorithm for controlling discontinuous transmission (DTX) of the GSM adaptive multi-rate (AMR) speech codec. First, based on the wavelet transform, the original IIR filter bank and the open-loop pitch detector are implemented via the wavelet filter bank and the wavelet-based pitch detection algorithm, respectively. The proposed wavelet filter bank divides the input speech signal into 9 frequency bands so that the signal level at each sub-band can be calculated. In addition, the background noise can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then one can apply support vector machine (SVM) to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database show that the proposed algorithm gives considerable VAD performances superior to the AMR VAD Option 1 and comparable with the AMR VAD Option 2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
3GPP TS 26.094 V6.1.0, Voice Activity Detector (VAD) for Adaptive Multi-Rate speech codec (2006)
Ramírez, J., Segura, J.C., Benítez, C., de la Torre, Á., Rubio, A.J.: A New Kullback-Leibler VAD for Speech Recognition in Noise. IEEE Signal Processing letters 11(2), 266–269 (2004)
Garner, N.R., Barrett, P.A., Howard, D.M., Tyrrell, A.M.: Robust noise detection for speech detection and enhancement. Electron. Lett. 33(4), 270–271 (1997)
Vapnik, V.N.: Statistical Learning Theory. Wiley, Chichester (1998)
Lin, C.-C., Chen, S.-H., Truong, T.K., Chang, Y.: Audio Classification and Categorization Based on Wavelets and Support Vector Machine. IEEE Trans. on Speech and Audio Processing 13(5), 644–651 (2005)
Donoho, D.L., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90, 1200–1224 (1995)
Stein, C.: Estimation of the mean of a multivariate normal distribution. Annals of Statistics 9(6), 1135–1151 (1981)
Mallat, S.: Multifrequency channel decomposition of images and wavelet model. IEEE Trans. Acoustic, Speech and Signal Processing 68, 2091–2110 (1980)
Chen, S.-H., Wang, J.-F.: Noise-robust pitch detection method using wavelet transform with aliasing compensation. IEE Proc. Vision, Image and Signal Processing 149(6), 327–334 (2002)
Hu, Y., Loizou, P.C.: Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. on Speech and Audio Processing 12(1), 59–67 (2004)
Aurora 2 Database (2000), http://www.elda.org/article52.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Chen, SH., Chang, Y., Truong, T.K. (2007). An Improved Voice Activity Detection Algorithm for GSM Adaptive Multi-Rate Speech Codec Based on Wavelet and Support Vector Machine. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_91
Download citation
DOI: https://doi.org/10.1007/978-3-540-73325-6_91
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73322-5
Online ISBN: 978-3-540-73325-6
eBook Packages: Computer ScienceComputer Science (R0)