Abstract
In this work, we model speech samples with a two-sided generalized Gamma distribution and evaluate its efficiency for voice activity detection. Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. Journal 54(2), 297–315 (1975)
Ying, G.S., Mitchell, C.D., Jamieson, L.H.: Endpoint detection of isolated utterances based on a modified Teager energy measurement. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 732–735 (1992)
Ganapathiraju, A., Webster, L., Trimble, J., Bush, K., Kornman, P.: Comparison of Energy-Based Endpoint Detectors for Speech Signal Processing. In: Proc. IEEE Southeastcon Bringing Together Education, Science and Technology, Florida, April 1996, pp. 500–503 (1996)
Tanyer, S., Ozer, H.: Voice activity detection in nonstationary noise. IEEE Trans. Speech and Audio Processing 8(4), 478–482 (2000)
Sohn, J., Kim, N.S., Sung, W.: A statistical model based voice activity detection. IEEE Signal Processing Letters 6(1), 1–3 (1999)
Chang, J., Shin, J., Kim, N.S.: Likelihood ratio test with complex Laplacian model for voice activity detection. In: Proc. European Conf. Speech Communication Technology (2003)
Nemer, E., Goubran, R., Mahmould, S.: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech and Audio Processing 9(3), 217–231 (2001)
Schwartz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Chen, S., Gopalakrishnam, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Workshop (1998)
Grunwald, P.: Minimum description length tutorial. In: Advances in Minimum Description Length: Theory and Applications, pp. 23–80. MIT Press, Cambridge, MA
Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication 32(1-2), 111–126 (2000)
Tritschler, A., Gopinath, R.: Improved speaker segmentation and segments clustering using the Bayesian information criterion. In: Proc. 1999 European Speech Processing, vol. 2, pp. 679–682 (1999)
Gazor, S., Zhang, W.: Speech probability distribution. IEEE Signal Processing Letters 10(7), 204–207 (2003)
Gazor, S., Zhang, W.: A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans. on Speech and Audio Processing 11(5), 498–505 (2003)
Martin, R.: Speech enhancement using short time spectral estimation with Gamma distributed priors. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Proc., vol. 1, pp. 253–256 (2005)
Nakamura, A.: Acoustic modeling for speech recognition based on a generalized Laplacian mixture distribution. Electronics and Communications in Japan Part II: Electronics 85(11), 32–42 (2002)
Shin, W.-H., Lee, B.-S., Lee, Y.-K., Lee, J.-S.: Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc. IEEE Intl Conf. Acoustics, Speech, and Signal Processing, vol. 3, pp. 1399–1402 (2000)
Shin, J.W., Chang, J.-H.: Statistical Modeling of Speech Signals Based on Generalized Gamma Distribution. IEEE Signal Processing Letters 12(3), 258–261 (2005)
Pigeon, S., Vandendorpe, L.: The M2VTS multimodal face database. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 403–409. Springer, Heidelberg (1997)
TIMIT Acoustic-Phonetic Continuous Speech Corpus. National Institute of Standards and Technology Speech. Disc 1-1.1, NTIS Order No. PB91-505065 (1990)
Varga, A., Steeneken, H., Tomlinson, M., Jones, D.: The NOISEX-92 study on the affect of additive noise on automatic speech recognition, Technical Report, DRA Speech Research Unit, Malvern, England (1992)
Shi, J.W., Chang, J.-H., Yun, H.S., Kim, N.S.: Voice Activity Detection based on Generalized Gamma Distribution. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 781–784 (2005)
Ramirez, J., Segura, C., Benitez, C., Torre, A., Rubio, A.: A new Kullback-Leibler VAD for speech recognition in noise. IEEE Signal Processing Letters 11(2), 266–269 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Almpanidis, G., Kotropoulos, C. (2006). Voice Activity Detection Using Generalized Gamma Distribution. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science(), vol 3955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752912_3
Download citation
DOI: https://doi.org/10.1007/11752912_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34117-8
Online ISBN: 978-3-540-34118-5
eBook Packages: Computer ScienceComputer Science (R0)