Abstract
An algorithm for a voice activity detector (VAD) is proposed. It is based on the exponential generalized autoregressive conditional heteroscedasticity (EGARCH) filter for generalized hyperbolic (GH), Gaussian random variables, adaptive threshold values and autocorrelation coefficients. EGARCH models are a new variation of GARCH models used especially in economic time series. A speech signal is assumed to have a GH because GH has heavier tails than the Gaussian distribution (GD) covering other heavy tailed distributions like hyperbolic, skewed \(t\), variance gamma (VG), inverse Gaussian (NIG), Cauchy, Student’s \(t\) and Laplace distributions. The distribution of noise signal is assumed to be uncorrelated (white noise), but in general, that is not necessary. In the proposed method, heteroscedasticity is modeled by EGARCH. A kernel smoothed function of conditional variances and autocorrelations generate the soft detection vector. Finally, hard detection is the result of comparing the soft detection vector with an adaptive threshold value. The simulation results show that the proposed VAD is able to operate down to \(-5\) dB.
References
Alberg, D., Shalit, H., & Yosef, R. (2008). Estimating stock market volatility using asymmetric GARCH models. Applied Financial Economics, 18, 1201–1208.
Barndorff-Nielsen, O. E. (1977). Exponentially decreasing distributions for the logarithm of the particle size. Proceedings of the Royal Society. London. Series A. Mathematical and Physical Sciences, 353, 401–419.
Bollerslev, T., Engle, R. F., & Nelson, D. B. (1994). ARCH models in finance. In R. F. Engle & D. L. McFadden (Eds.), Handbook of Econometrics, volume IV, Chapter 49. Amsterdam: Elsevier Sciences B. V.
Cho, Y. D., Al-Naimi, K., & Kondoz, A. (2001). Improved voice activity detection based on a smoothed statistical likelihood ratio. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 2, 737–740.
Engle, R., & Ng, V. (1993). Measuring and testing the impact of news on volatility. Journal of Finance, 48, 1749–1778.
Fan, J., & Yao, Q. (2003). Nonlinear time series: Nonparametric and parametric methods. New York: Springer.
Garner, N. R., Barrett, P. A., Howard, D. M., & Tyrrell, A. M. (1997). Robust noise detection for speech detection and enhancement. Electronics Letters, 33, 270–271.
Gazor, S., & Zhang, W. (2003). A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Transactions on Speech and Audio Processing, 11, 498–505.
Hartz, C., Mittnik, S., & Paolella, M. (2006). Accurate value-at-risk forecasting based on the normal-GARCH model. Computational Statistics and Data Analysis, 51, 2295–2312.
Huang, Y. C., & Chen, S. C. (2002). Warrants pricing: Stochastic volatility vs. Black-Scholes. Pacific-Basin Finance Journal, 10, 393–409.
Lee, S., & Hansen, B. E. (1994). Asymptotic theory for the GARCH(1, l) quasimaximum likelihood estimator. Econometric Theory, 10, 29–52.
Liu, X., He, J., & Liu, Q. (2005). Volatility analysis of Shenzheng Stock Market based on VaR-EGARCH(1, 1)-GED model. Nankai Business Review.
McNeil, A. J., Frey, R., & Embrechts, P. (2005). Quantitative risk management: Concepts, techniques and tools. Princeton: Princeton University Press.
Mousazadeh, S., & Cohen, I. (2011). AR-GARCH in presence of noise: Parameter estimation and its application to voice activity detection. IEEE Transactions on Audio, Speech and Language Processing, 19, 916–926.
Nadaraya, E. A. (1989). Nonparametric estimation of probability densities and regression curves. English translation by S. Kotz. Kluwer, Dordrecht.
Nelson, D. B. (1991). Conditional heteroscedasticity in asset pricing: A new approach. Econometrica, 59, 347–370.
Pagan, A. R., & Schwert, G. W. (1990). Alternative models for conditional stock volatility. Journal of Econometrics, 45, 267–290.
Pederzoli, C. (2006). Stochastic volatility and GARCH: A comparison based on UK stock data. The European Journal of Finance, 12, 41–59.
R Development Core Team. (2012). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Ramírez, J., & Segura, J. C. (2005). Statistical voice activity detection using a multiple observation likelihood ratio test. IEEE Signal Processing Letters, 12, 689–692.
Ramírez, J., Segura, J. C., Benítez, C., de la Torre, A., & Rubio, A. (2004). Efficient voice activity detection algorithms using long-term speech information. Speech Communication, 42, 271–287.
Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 9, 87–95.
Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6, 1–3.
Tahmasbi, R., & Rezaei, S. (2007). A soft voice activity detection using GARCH filter and variance gamma distribution. IIE Transactions on Audio, Speech, Language processing, 15, 1129–1134.
Tahmasbi, R., & Rezaei, S. (2008). Change point detection in GARCH model for voice activity detection. IIE Transactions on Audio, Speech, Language processing, 16, 1038–1046.
Acknowledgments
The authors would like to thank the Editor and the referee for careful reading and for their comments which greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Proof of (6)
Note that we can write
where \(f_W (w)\) denotes the pdf of \(W\). Substituting the form for \(f_W (\cdot )\) from (5), we obtain
Transforming \(w\) to
reduces (15) to
The result follows by the definition of the modified Bessel function of the third kind. \(\square \)
1.2 Mean and variance of the GH distribution
If \(W\) is a GIG random variable then
In addition, \(\mathbb E (X) = \mu + \mathbb E (W) \gamma \) and Var \((X) = \eta ^2 \mathbb E (W) + \gamma ^2 \mathrm{Var} (W)\). \(\square \)
1.3 Proof of Proposition 1
Let \(X\sim \ GH (\lambda , \chi , \psi , \mu , \eta ^2, \gamma )\). By (2),
where \(H (\theta ) = \mathbb E [\exp (-\theta W)]\) is the Laplace transform of a GIG random variable.
Let \(Y=c+\sum ^n_{j=1} b_jX_j \) and let \(X_j \sim GH (\lambda , \chi , \psi , \mu _j, \eta ^2_j, \gamma _j)\) for each \(j\). Then,
The result follows. \(\square \)
Rights and permissions
About this article
Cite this article
Salemi, U.H., Rezaei, S. & Nadarajah, S. VAD Based on Kernel Smoothed Function of EGARCH Models. Wireless Pers Commun 72, 299–313 (2013). https://doi.org/10.1007/s11277-013-1015-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-013-1015-1