Abstract
In this paper, we present an approach to incorporate discriminative weight training into a statistical model-based voice activity detection (VAD) method. In our approach, the VAD decision rule is derived from the optimally weighted likelihood ratios (LRs) using a minimum classification error (MCE) method. An adaptive on-line means of selecting two kinds of weights based on a power spectral flatness measure (PSFM) is devised for performance improvement. The proposed approach is compared to conventional schemes under various noise conditions, and shows better performance.
Similar content being viewed by others
References
J.-H. Chang, N.S. Kim, Distorted speech rejection for automatic speech recognition in wireless communication. IEICE Trans. Inf. Syst. E87-D(7), 1978–1981 (2004)
J.-H. Chang, J.W. Shin, N.S. Kim, Voice activity detector employing generalised Gaussian distribution. Electron. Lett. 40(24), 1561–1563 (2004)
J.-H. Chang, N.S. Kim, S.K. Mitra, Voice activity detection based on multiple statistical models. IEEE Trans. Signal Process. 54(6), 1965–1976 (2006)
J.-H. Chang, S. Gazor, N.S. Kim, S.K. Mitra, Multiple statistical models for soft decision in noisy speech enhancement. Pattern Recognit. 40(3), 1123–1134 (2007)
Y.D. Cho, A. Kondoz, Analysis and improvement of a statistical model-based voice activity detector. IEEE Signal Process. Lett. 8(10), 276–278 (2001)
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. ASSP-32(6), 1190–1121 (1984)
ETSI, Voice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels. ETSI EN 301 708 v7.1.1
ITU-T, A silence compression scheme for G.729 optimised for terminals conforming to ITU-T V.70. ITU-T Rec. G.729 Annex B
B.-H. Juang, W. Chou, C.-H. Lee, Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)
S.-I. Kang, Q.-H. Jo, J.-H. Chang, Discriminative weight training for a statistical model-based voice activity detection. IEEE Signal Process. Lett. 15, 170–173 (2008)
Y.C. Lee, S.S. Ahn, Statistical model-based VAD algorithm with wavelet transform. IEICE Trans. Fundam. E89-A(6), 1594–1600 (2006)
J. Ramirez, J.M. Gorriz, J.C. Segura, C.G. Puntonet, A.J. Rubio, Speech/non-speech discrimination based on contextual information integrated bispectrum LRT. IEEE Signal Process. Lett. 13(8), 497–500 (2006)
M.H. Savoji, A robust algorithm for accurate endpointing of speech signals. Speech Commun. 8, 45–60 (1989)
J. Sohn, W. Sung, A voice activity detector employing soft decision based noise spectrum adaptation. Proc. Int. Conf. Acoust. Speech Signal Process. 1, 365–368 (1998)
J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)
A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute for Information Technology Advancement) (IITA-2008-C1090-0902-0010) and This research was financially supported by the MKE and KOTEF through the Human Resource Training Project for Strategic Technology.
Rights and permissions
About this article
Cite this article
Kang, SI., Chang, JH. Voice Activity Detection Based on Discriminative Weight Training Incorporating a Spectral Flatness Measure. Circuits Syst Signal Process 29, 183–194 (2010). https://doi.org/10.1007/s00034-009-9141-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-009-9141-4