Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments

Chung, Kyungyong; Oh, Sang Yeob

doi:10.1007/s11277-015-3169-5

Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments

Published: 31 December 2015

Volume 89, pages 747–759, (2016)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Kyungyong Chung¹ &
Sang Yeob Oh²

349 Accesses
Explore all metrics

Abstract

Noise-elimination technology is used to eliminate noise, including environmental noise, from voice signals in order to increase voice recognition rates. Noise estimation is the most important factor in noise-elimination technology. One of the effective estimation methods is voice activity detection, which is based on the statistical properties of noise and voice. This method is a way of estimating noise using the statistical properties of both noise and voice, which have an independent Gaussian distribution. In cases of severe differences in a statistical property, like white noise, the method is very reliable but limited to signals having a low signal-to-noise ratio (SNR) or having speech shape noise, which has statistical properties similar to voice signals. Methods to increase the voice recognition rate suffer from decreasing voice recognition performance due to distortion of the voice spectrum and to missing voice frames, because noise remains if there has been incorrect estimation of the noise. Degradation in voice recognition performance emerges in the differences between the model training environment and the voice recognition environment. In order to decrease environmental discordance, various silence feature normalization methods are used. Existing silence feature normalization suffers from degradation of recognition performance because the classification accuracy for the voiced and unvoiced signals decreases by an increasing energy level in the silence section of a low SNR. This paper proposes a robust voice characteristic detection method for noisy environments using feature extraction and unvoiced feature normalization for a classification relative to the voiced and unvoiced signals. The suggested method constitutes a model for recognition by extracting the characteristics for classification of the voiced and unvoiced signals in a high SNR environment. Also, the model affects noise for voice characteristics less, and recognition performance improves by using the Cepstrum feature distribution property of voiced and unvoiced signals with a low SNR. The model was checked for its ability to improve recognition performance relative to the existing method based on recognition experiment results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

Article 24 July 2017

A novel voice activity detection algorithm using modified global thresholding

Article 18 November 2020

A technique for noise robust voice activity detection under uncontrolled environment

Article 01 August 2024

References

Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248).
Chung, K., Boutaba, R., & Hariri, S. (2014). Recent trends in digital convergence information system. Wireless Personal Communications, 79(4), 2409–2413.
Article Google Scholar
Oh, S., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.
Article Google Scholar
Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2015). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications. doi:10.1007/s11277-015-2788-1.
Google Scholar
Kim, S. H., & Chung, K. (2015). Emergency situation monitoring service using context motion tracking of chronic disease patients. Cluster Computing, 18(2), 747–759.
Article Google Scholar
Jung, H., & Chung, K. (2015). Knowledge based dietary nutrition recommendation for obesity management. Information Technology and Management. doi:10.1007/s10799-015-0218-4.
Google Scholar
Ball, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27, 113–120.
Article Google Scholar
Ahn, C. S., & Oh, S. Y. (2012). Gaussian Model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172.
Google Scholar
Kim, J. H., & Chung, K. Y. (2014). Ontology-based healthcare context information model to implement ubiquitous environment. Multimedia Tools and Applications, 71(2), 873–888.
Article Google Scholar
Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.
Article Google Scholar
Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.
Article Google Scholar
Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285.
Google Scholar
Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather WebBot. Wireless Personal Communications, 73(2), 243–256.
Article Google Scholar
Jung, H., & Chung, K. (2015). Sequential pattern profiling based bio-detection for smart health service. Cluster Computing, 18(1), 209–219.
Article Google Scholar
Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.
Article Google Scholar
Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.
Article Google Scholar
Kim, S. H., & Chung, K. Y. (2014). 3D simulator for stability analysis of finite slope causing plane activity. Multimedia Tools and Applications, 68(2), 455–463.
Article Google Scholar
Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management., 10(10), 277–282.
Google Scholar
Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428).
Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management, 10(11), 377–382.
Google Scholar
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752).
Fletcher, H. (1940). Auditory patterns. Reviews of Modern Physics, 12(1), 47–65.
Article Google Scholar
Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12).
Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of ICSLP (pp. 2558–2561).
Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.
Article Google Scholar
Chung, K., Boutaba, R., & Hariri, S. (2015). Knowledge-based decision support systems. Information Technology and Management. doi:10.1007/s10799-015-0251-3.
Google Scholar
Jung, H., & Chung, K. (2015). P2P context awareness based sensibility design recommendation using color and bio-signal analysis. Peer-to-Peer Networking and Applications,. doi:10.1007/s12083-015-0398-z.
Google Scholar
Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453.
Google Scholar
Hirsch, H.-G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000 (pp. 181–188).
Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248).

Download references

Acknowledgments

This research was supported by the Gachon University research fund of 2015 (GCU-2015-0085).

Author information

Authors and Affiliations

School of Computer Information Engineering, Sangji University, 83, Sangjidae-gil, Wonju-si, Gangwon-do, 220-702, Korea
Kyungyong Chung
Department of Computer Engineering, Gachon University, Bokjeong-dong, Sujeong-gu, Seongnam-si, Gyeonggi-do, 461-701, Korea
Sang Yeob Oh

Authors

Kyungyong Chung
View author publications
You can also search for this author inPubMed Google Scholar
Sang Yeob Oh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sang Yeob Oh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, K., Oh, S.Y. Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments. Wireless Pers Commun 89, 747–759 (2016). https://doi.org/10.1007/s11277-015-3169-5

Download citation

Published: 31 December 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11277-015-3169-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

A novel voice activity detection algorithm using modified global thresholding

A technique for noise robust voice activity detection under uncontrolled environment

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now