Abstract
Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods.
Similar content being viewed by others
References
Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248).
Ahn, C. S., & Oh, S. Y. (2012). Gaussian model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172.
Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management, 10(10), 277–282.
Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285.
Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428).
Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management., 10(11), 377–382.
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752).
Park, J. S., & Ko, H. S. (2013). Robust speech endpoint detection in noisy environments for HRI. The Journal of the Acoustical Society of Korea, 32(2), 147–156.
Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12).
Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of the international conference on spoken language processing (pp. 2558–2561).
Han, I. S., & Ahn, C. S. (2014). Robust speech detection using SEM and SFN. International Journal of Multimedia and Ubiquitous Engineering, 9(9), 61–68.
Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.
Chung, K., & Park, R. C. (2016). PHR open platform based smart health service using distributed object group framework. Cluster Computing, 19(1), 505–517.
Kim, J. C., & Chung, K. (2017). Depression index service using knowledge based crowdsourcing in smart health. Wireless Personal Communication, 93(1), 255–268.
Park, R. C., Jung, H., Chung, K., & Yoon, K. H. (2015). Picocell based telemedicine health service for human UX/UI. Multimedia Tools and Applications, 74(7), 2519–2534.
Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453.
Chung, K., & Oh, S. Y. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747–759.
Oh, S. Y., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.
Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.
Pearce, D., Hirsch, H., & Deutschland Gmbh, E. E. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 (pp. 29–32).
Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248).
Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.
Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather webbot. Wireless Personal Communications, 73(2), 243–256.
Chung, K., & Park, R. C. (2016). P2P cloud network services for IoT based disaster situations information. Peer-to-Peer Networking and Applications, 9(3), 566–577.
Jung, H., Yoo, H., & Chung, K. (2016). Associative context mining for ontology-driven hidden knowledge discovery. Cluster Computing, 19(4), 2261–2271.
Oh, S. Y., & Chung, K. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683–1690.
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 16(1), 229–238.
Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2016). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications, 86(1), 183–199.
Chung, K., Kim, J. C., & Park, R. C. (2016). Knowledge-based health service considering user convenience using hybrid Wi-Fi P2P. Information Technology and Management, 17(1), 67–80.
Jung, H., & Chung, K. (2016). Knowledge based dietary nutrition recommendation for obesity management. Information Technology and Management, 17(1), 29–42.
Kim, S. H., & Chung, K. (2016). Emergency situation monitoring service using context motion tracking of chronic disease Patients. Cluster Computing, 18(2), 747–759.
Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.
Yoo, H., & Chung, K. (2017). PHR based diabetes index service model using life behavior analysis. Wireless Personal Communications, 93(1), 161–174.
Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.
Chung, K., & Oh, S. Y. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629–635.
Acknowledgements
This research was supported by the Ministry of Science, ICT & Future Planning (MISP), Korea, under the National Program for Excellence in SW (R7015-16-1003) supervised by the Institute for Information & communications Technology Promotion (IITP) (R7015-16-1003).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Oh, S., Chung, K. Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals. Wireless Pers Commun 98, 3287–3297 (2018). https://doi.org/10.1007/s11277-017-4645-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-017-4645-x