Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

Oh, SangYeob; Chung, Kyungyong

doi:10.1007/s11277-017-4645-x

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

Published: 24 July 2017

Volume 98, pages 3287–3297, (2018)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

SangYeob Oh¹ &
Kyungyong Chung²

194 Accesses
5 Citations
Explore all metrics

Abstract

Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248).
Ahn, C. S., & Oh, S. Y. (2012). Gaussian model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172.
Google Scholar
Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management, 10(10), 277–282.
Google Scholar
Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285.
Google Scholar
Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428).
Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management., 10(11), 377–382.
Google Scholar
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752).
Park, J. S., & Ko, H. S. (2013). Robust speech endpoint detection in noisy environments for HRI. The Journal of the Acoustical Society of Korea, 32(2), 147–156.
Article Google Scholar
Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12).
Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of the international conference on spoken language processing (pp. 2558–2561).
Han, I. S., & Ahn, C. S. (2014). Robust speech detection using SEM and SFN. International Journal of Multimedia and Ubiquitous Engineering, 9(9), 61–68.
Article Google Scholar
Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.
Article Google Scholar
Chung, K., & Park, R. C. (2016). PHR open platform based smart health service using distributed object group framework. Cluster Computing, 19(1), 505–517.
Article Google Scholar
Kim, J. C., & Chung, K. (2017). Depression index service using knowledge based crowdsourcing in smart health. Wireless Personal Communication, 93(1), 255–268.
Article Google Scholar
Park, R. C., Jung, H., Chung, K., & Yoon, K. H. (2015). Picocell based telemedicine health service for human UX/UI. Multimedia Tools and Applications, 74(7), 2519–2534.
Article Google Scholar
Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453.
Google Scholar
Chung, K., & Oh, S. Y. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747–759.
Article Google Scholar
Oh, S. Y., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.
Article Google Scholar
Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.
Article Google Scholar
Pearce, D., Hirsch, H., & Deutschland Gmbh, E. E. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 (pp. 29–32).
Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248).
Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.
Article Google Scholar
Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather webbot. Wireless Personal Communications, 73(2), 243–256.
Article Google Scholar
Chung, K., & Park, R. C. (2016). P2P cloud network services for IoT based disaster situations information. Peer-to-Peer Networking and Applications, 9(3), 566–577.
Article Google Scholar
Jung, H., Yoo, H., & Chung, K. (2016). Associative context mining for ontology-driven hidden knowledge discovery. Cluster Computing, 19(4), 2261–2271.
Article Google Scholar
Oh, S. Y., & Chung, K. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683–1690.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 16(1), 229–238.
Article Google Scholar
Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2016). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications, 86(1), 183–199.
Article Google Scholar
Chung, K., Kim, J. C., & Park, R. C. (2016). Knowledge-based health service considering user convenience using hybrid Wi-Fi P2P. Information Technology and Management, 17(1), 67–80.
Article Google Scholar
Jung, H., & Chung, K. (2016). Knowledge based dietary nutrition recommendation for obesity management. Information Technology and Management, 17(1), 29–42.
Article Google Scholar
Kim, S. H., & Chung, K. (2016). Emergency situation monitoring service using context motion tracking of chronic disease Patients. Cluster Computing, 18(2), 747–759.
Article Google Scholar
Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.
Article Google Scholar
Yoo, H., & Chung, K. (2017). PHR based diabetes index service model using life behavior analysis. Wireless Personal Communications, 93(1), 161–174.
Article Google Scholar
Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.
Article Google Scholar
Chung, K., & Oh, S. Y. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629–635.
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Ministry of Science, ICT & Future Planning (MISP), Korea, under the National Program for Excellence in SW (R7015-16-1003) supervised by the Institute for Information & communications Technology Promotion (IITP) (R7015-16-1003).

Author information

Authors and Affiliations

Department of Computer Engineering, Gachon University, Bokjeong-dong, Sujeong-gu, Seongnam-si, Gyeonggi-do, 461-701, Korea
SangYeob Oh
Department of Computer Science, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-Si, Gyeonggi-do, Korea
Kyungyong Chung

Authors

SangYeob Oh
View author publications
You can also search for this author in PubMed Google Scholar
Kyungyong Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyungyong Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oh, S., Chung, K. Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals. Wireless Pers Commun 98, 3287–3297 (2018). https://doi.org/10.1007/s11277-017-4645-x

Download citation

Published: 24 July 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11277-017-4645-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation