Skip to main content
Log in

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Speech enhancement algorithms play an important role in speech signal processing. Over the past several decades, many algorithms have been studied for speech enhancement. A speech enhancement algorithm uses a noise removal method and a statistical model filter to analyze the speech signal in the frequency domain. Spectral subtraction and Wiener filters have been used as representative algorithms. These algorithms have excellent speech enhancement performance, but suffer from deterioration in performance due to specific noise or low signal-to-noise ratio (SNR) environments. In addition, according to estimations of erroneous noise, a noise existing in a voice signal is maintained so that a spectrum corresponding to a voice signal is distorted, or a frame corresponding to a voice signal cannot be retrieved, and voice recognition performance deteriorates. The problem of deterioration in speech recognition performance arises from the difference between speech recognition and training model. We use silence-feature normalization model as a methodology to improve the recognition rate resulting from the difference in the noisy environments. Conventional silence-feature normalization has a problem in that the silent part of the energy increases, which affects recognition performance due to unclear boundaries categorizing the voice. In this study, we use the cepstrum feature of the noise signals in the silence-feature normalization model to improve the performance of silence-feature normalization in a signal with a low SNR by setting a reference value for voiced and unvoiced classification. As a result of recognition rate confirmation, the recognition rates improve in performance, compared with other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Zoltan, T., Peter, M., Zoltan, T., & Tibor, F. (2005). Robust voice activity detection based on the entropy of noise-suppressed spectrum. In Proceedings of the international conference on speech communication and technology (pp. 245–248).

  2. Ahn, C. S., & Oh, S. Y. (2012). Gaussian model optimization using configuration thread control in CHMM vocabulary recognition. The Journal of Digital Policy and Management, 10(7), 167–172.

    Google Scholar 

  3. Ahn, C. S., & Oh, S. Y. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. The Journal of Digital Policy and Management, 10(10), 277–282.

    Google Scholar 

  4. Shen, G., & Chung, H. Y. (2010). Cepstral distance and log-energy based silence feature normalization for robust speech recognition. The Journal of the Acoustical Society of Korea, 29(4), 278–285.

    Google Scholar 

  5. Wang, K. C., & Tsai, Y. H. (2008). Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. In Proceedings of the international symposium on universal communication (pp. 423–428).

  6. Ahn, C. S., & Oh, S. Y. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. The Journal of Digital Policy and Management., 10(11), 377–382.

    Google Scholar 

  7. Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of the IEEE international conference acoustics, speech, and signal processing (pp. 749–752).

  8. Park, J. S., & Ko, H. S. (2013). Robust speech endpoint detection in noisy environments for HRI. The Journal of the Acoustical Society of Korea, 32(2), 147–156.

    Article  Google Scholar 

  9. Yao, K. S., Visser, E., Kwon, O. W., & Lee, T. W. (2003). A speech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments. In Proceedings of the international conference on speech communication and technology (pp. 9–12).

  10. Tai, C. F., & Hung, J. W. (2006). Silence energy normalization for robust speech recognition in additive noise environments. In Proceedings of the international conference on spoken language processing (pp. 2558–2561).

  11. Han, I. S., & Ahn, C. S. (2014). Robust speech detection using SEM and SFN. International Journal of Multimedia and Ubiquitous Engineering, 9(9), 61–68.

    Article  Google Scholar 

  12. Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.

    Article  Google Scholar 

  13. Chung, K., & Park, R. C. (2016). PHR open platform based smart health service using distributed object group framework. Cluster Computing, 19(1), 505–517.

    Article  Google Scholar 

  14. Kim, J. C., & Chung, K. (2017). Depression index service using knowledge based crowdsourcing in smart health. Wireless Personal Communication, 93(1), 255–268.

    Article  Google Scholar 

  15. Park, R. C., Jung, H., Chung, K., & Yoon, K. H. (2015). Picocell based telemedicine health service for human UX/UI. Multimedia Tools and Applications, 74(7), 2519–2534.

    Article  Google Scholar 

  16. Choi, G. K., & Kim, S. H. (2009). Voice activity detection method using psycho-acoustic model based on speech energy maximization in noisy environments. The Journal of the Acoustical Society of Korea, 28(5), 447–453.

    Google Scholar 

  17. Chung, K., & Oh, S. Y. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747–759.

    Article  Google Scholar 

  18. Oh, S. Y., & Chung, K. Y. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893–899.

    Article  Google Scholar 

  19. Oh, S. Y., & Chung, K. Y. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439–2451.

    Article  Google Scholar 

  20. Pearce, D., Hirsch, H., & Deutschland Gmbh, E. E. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. ISCA ITRW ASR2000 (pp. 29–32).

  21. Zhu, W. Z., & Shaughnessy, D. O. (2005). Log energy dynamic range normalization for robust for robust speech recognition. In Proceedings of the international conference on acoustics, speech, and signal (pp. 245–248).

  22. Jung, H., & Chung, K. Y. (2014). Discovery of automotive design paradigm using relevance feedback. Personal and Ubiquitous Computing, 18(6), 1363–1372.

    Article  Google Scholar 

  23. Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather webbot. Wireless Personal Communications, 73(2), 243–256.

    Article  Google Scholar 

  24. Chung, K., & Park, R. C. (2016). P2P cloud network services for IoT based disaster situations information. Peer-to-Peer Networking and Applications, 9(3), 566–577.

    Article  Google Scholar 

  25. Jung, H., Yoo, H., & Chung, K. (2016). Associative context mining for ontology-driven hidden knowledge discovery. Cluster Computing, 19(4), 2261–2271.

    Article  Google Scholar 

  26. Oh, S. Y., & Chung, K. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683–1690.

    Article  Google Scholar 

  27. Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  28. Kim, J. C., Jung, H., Kim, S. H., & Chung, K. (2016). Slope based intelligent 3D disaster simulation using physics engine. Wireless Personal Communications, 86(1), 183–199.

    Article  Google Scholar 

  29. Chung, K., Kim, J. C., & Park, R. C. (2016). Knowledge-based health service considering user convenience using hybrid Wi-Fi P2P. Information Technology and Management, 17(1), 67–80.

    Article  Google Scholar 

  30. Jung, H., & Chung, K. (2016). Knowledge based dietary nutrition recommendation for obesity management. Information Technology and Management, 17(1), 29–42.

    Article  Google Scholar 

  31. Kim, S. H., & Chung, K. (2016). Emergency situation monitoring service using context motion tracking of chronic disease Patients. Cluster Computing, 18(2), 747–759.

    Article  Google Scholar 

  32. Jung, H., & Chung, K. (2015). Ontology-driven slope modeling for disaster management service. Cluster Computing, 18(2), 677–692.

    Article  Google Scholar 

  33. Yoo, H., & Chung, K. (2017). PHR based diabetes index service model using life behavior analysis. Wireless Personal Communications, 93(1), 161–174.

    Article  Google Scholar 

  34. Kim, K., Hong, M., Chung, K., & Oh, S. Y. (2015). Estimating unreliable objects and system reliability in P2P network. Peer-to-Peer Networking and Applications, 8(4), 610–619.

    Article  Google Scholar 

  35. Chung, K., & Oh, S. Y. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629–635.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the Ministry of Science, ICT & Future Planning (MISP), Korea, under the National Program for Excellence in SW (R7015-16-1003) supervised by the Institute for Information & communications Technology Promotion (IITP) (R7015-16-1003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyungyong Chung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oh, S., Chung, K. Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals. Wireless Pers Commun 98, 3287–3297 (2018). https://doi.org/10.1007/s11277-017-4645-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-017-4645-x

Keywords

Navigation