Abstract
The presence of noise is one of the major challenges and concerns in speech recognition systems. There are in particular different kinds of noises (pink, white and leopard) that can adversely affect a speech signal in various ways and degrees. In this study, the extent of resistance of a speech signal’s formants or in other words, the displacement of the formants have been measured against being subjected to different conventional noises. The methodology adopted was to apply different noises to the original voice signal, then to measure and to investigate the amount of formant location displacement. In this paper, the mean square movement (MSM) parameter has been introduced. This represents the deviation and displacement amount of the frequencies of the formants caused by applying the various noises. All of the investigations were conducted under three different SNR conditions (5, 10 and 15 dB). This allowed for the assessment of the influence of the signal-to-noise ratio (SNR) on the MSM parameter and the extent of the displacements of the formants. The results indicate that the frequency of the formants under these three SNR amounts was resistant against the machine gun type of noise, whilst white noise caused the most measureable effect and displacement in the frequencies of the formants.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Acero, A. (1999). Formant analysis and synthesis using hidden Markov models. Sixth European Conference on Speech Communication and Technology. Retrieved July 19, 2018 from https://www.microsoft.com/en-us/research/wp-content/uploads/1999/09/1999-alexac-eurospeech.pdf.
Darwin, C. (2008). Computational auditory scene analysis: Principles, algorithms and applications. The Journal of the Acoustical Society of America, 124(1), 13–13.
Dendrinos, M., Bakamidis, S., & Carayannis, G. (1991). Speech enhancement from noise: A regenerative approach. Speech Communication, 10(1), 45–57.
Duan, Z., Mysore, G. J., & Smaragdis, P. (2012). Speech enhancement by online non-negative spectrogram decomposition in nonstationary noise environments. Thirteenth Annual Conference of the International Speech Communication Association. Retrieved, July 19, 2018 from https://ccrma.stanford.edu/~gautham/Site/Publications_files/duan-interspeech2012.pdf.
Gargouri, D., Kammoun, M. A., & Hamida, A. B. (2006, May). A comparative study of formant frequencies estimation techniques. Proceedings of the 5th WSEAS International Conference on Signal Processing, Istanbul, Turkey (pp. 15–19).
Hagerman, B. (1984). Clinical measurements of speech reception threshold in noise. Scandinavian Audiology, 13(1), 57–63.
Hernando, J., & Nadeu, C. (1997). Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Transactions on Speech and Audio Processing, 5(1), 80–84.
Hu, Y., & Loizou, P. C. (2007). A comparative intelligibility study of single-microphone noise reduction algorithms. The Journal of the Acoustical Society of America, 122(3), 1777–1786.
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Kammi, S., & Mollaei, M. R. K. (2017). Noisy speech enhancement with sparsity regularization. Speech Communication, 87, 58–69.
Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2080–2090.
Loizou, P. C., & Kim, G. (2011). Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 47–56.
Peinado, A., & Segura, J. (2006). Speech recognition over digital channels: Robustness and standards. Chichester: Wiley.
Rabiner, L. R., & Schafer, R. W. (2007). Introduction to digital speech processing. Foundations and Trends® in Signal Processing, 1(1–2), 1–194.
Sameti, H., Sheikhzadeh, H., Deng, L., & Brennan, R. L. (1998). HMM-based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Transactions on Speech and Audio processing, 6(5), 445–455.
Signal Processing Information Base (2013, July 21). Retrieved March 20, 2017, from http://spib.linse.ufsc.br/noise.html
Teacher, C., & Watkins, H. (1978). ANDVT microphone and audio system study. Ketron final report. Washington, DC: Ketron, Inc.
Weber, K., Bengio, S., & Bourlard, H. (2001). Hmm2-extraction of formant features and their use for robust ASR. European Conference on Speech Communication and Technology (Eurospeech 2001) (No. EPFL-CONF-82693, pp. 607–610).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sadeghi, M., Marvi, H. & Ali, M. The effect of different acoustic noise on speech signal formant frequency location. Int J Speech Technol 21, 741–752 (2018). https://doi.org/10.1007/s10772-018-9540-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-9540-7