Skip to main content
Log in

Enhancing speech intelligibility in reverberant spaces by a speech features distributions dependent pre-processing

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we deal with a pre-processing based on speech envelope modulation for intelligibility enhancement in reverberant large dimension public enclosed spaces. In fact, the blurring effect due to reverberation alters the speech perception in such conditions. This phenomenon results from the masking of consonants by the reverberated tails of the previous vowels. This is particularly accentuated for elderly persons suffering from presbycusis. The proposed pre-processing is inspired from the steady-state suppression technique which consists in the detection of the steady-state portions of speech and the multiplication of their waveforms with an attenuation coefficient in order to decrease their masking effect. While the steady-state suppression technique is performed in the frequency domain, the pre-processing described in this paper is rather performed in the temporal domain. Its key novelty consists in the detection of the speech voiced segments using a priori knowledge about the distributions of the powers and the durations of voiced and unvoiced phonemes. The performances of this pre-processing are evaluated with an objective criterion and with subjective listening tests involving normal hearing persons and using a set of nonsense Vowel–Consonant–Vowel syllables and railway station vocal announcements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.icityforall.eu/.

  2. https://www.itu.int/rec/T-REC-P.50/en.

  3. http://www.fon.hum.uva.nl/praat/.

References

  • Arai, T., Hodoshima, N., & Yasu, K. (2010). Using steady-state suppression to improve speech intelligibility in reverberant environments for elderly listeners. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1775–1780.

    Article  Google Scholar 

  • Arai, T., Kinoshita, K., Hodoshima, N., Kusumoto, A., & Kitamura, T. (2002). Effects of suppressing steady-state portions of speech on intelligibility in reverberant environments. Acoustical Science and Technology, 23(4), 229–232.

    Article  Google Scholar 

  • Arai, T., Murakami, Y., Hayashi, N., Hodoshima, N., & Kurisu, K. (2007). Inverse correlation of intelligibility of speech in reverberation with the amount of overlap-masking. Acoustical Science and Technology, 28(6), 438–441.

    Article  Google Scholar 

  • Assmann, P., & Summerfield, Q. (2004). The perception of speech under adverse conditions. In S. Greenberg, W. A. Ainsworth, A. N. Popper, & R. R. Fay (Eds.), Speech processing in the auditory system (pp. 231–308). Berlin: Springer.

    Chapter  Google Scholar 

  • Bolt, R., & MacDonald, A. (1949). Theory of speech masking by reverberation. The Journal of the Acoustical Society of America, 21(6), 577–580.

    Article  Google Scholar 

  • Bouguelia, M. R., Nowaczyk, S., Santosh, K., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics. https://doi.org/10.1007/s13042-017-0645-0.

  • Duquesnoy, A., & Plomp, R. (1980). Effect of reverberation and noise on the intelligibility of sentences in cases of presbyacusis. The Journal of the Acoustical Society of America, 68(2), 537–544.

    Article  Google Scholar 

  • Flanagan, J., Berkley, D., Elko, G., West, J., & Sondhi, M. (1991). Autodirective microphone systems. Acta Acustica United with Acustica, 73(2), 58–71.

    Google Scholar 

  • Furui, S. (1986). On the role of spectral transition for speech perception. The Journal of the Acoustical Society of America, 80(4), 1016–1025.

    Article  Google Scholar 

  • Habets, E. A. (2005). Multi-channel speech dereverberation based on a statistical model of late reverberation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, 2005 (ICASSP’05) (Vol. 4, pp. iv–173). IEEE.

  • Habets, E. A. (2010). Speech dereverberation using statistical reverberation models. In P. A. Naylor & D. G. Gaubitch (Eds.), Speech dereverberation (pp. 57–93). New York: Springer.

    Chapter  Google Scholar 

  • Halling, D. C., & Humes, L. E. (2000). Factors affecting the recognition of reverberant speech by elderly listeners. Journal of Speech, Language, and Hearing Research, 43(2), 414–431.

    Article  Google Scholar 

  • Helfer, K. S., & Huntley, R. A. (1991). Aging and consonant errors in reverberation and noise. The Journal of the Acoustical Society of America, 90(4), 1786–1796.

    Article  Google Scholar 

  • Hodoshima, N., Miyauchi, Y., Yasu, K., & Arai, T. (2007). Steady-state suppression for improving syllable identification in reverberant environments: A case study in an elderly person. Acoustical Science and Technology, 28(1), 53–55.

    Article  Google Scholar 

  • Humes, L. E., & Dubno, J. R. (2010). Factors affecting speech understanding in older adults. In S. Gordon-Salant, R. D. Frisina, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 211–257). New York: Springer.

    Chapter  Google Scholar 

  • Kodrasi, I., & Doclo, S. (2016). Joint dereverberation and noise reduction based on acoustic multi-channel equalization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(4), 680–693.

    Article  Google Scholar 

  • Langhans, T., & Strube, H. (1982). Speech enhancement by nonlinear multiband envelope filtering. In IEEE international conference on acoustics, speech, and signal processing, ICASSP’82 (Vol. 7, pp. 156–159). IEEE.

  • Mechergui, N., Djaziri-Larbi, S., & Jaïdane, M. (2017). Speech based transmission index for all: An intelligibility metric for variable hearing ability. The Journal of the Acoustical Society of America, 141(3), 1470–1480.

    Article  Google Scholar 

  • Miyoshi, M., & Kaneda, Y. (1988). Inverse filtering of room acoustics. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(2), 145–152.

    Article  Google Scholar 

  • Mzah, Y., Ahfir, M., & Jaidane, M. (2016). Late pre-dereverberation for speech intelligibility enhancement in public address systems. In International symposium on signal, image, video and communications (ISIVC) (pp. 291–296). IEEE.

  • Nabelek, A. K., & Robinette, L. (1978). Influence of the precedence effect on word identification by normally hearing and hearing-impaired subjects. The Journal of the Acoustical Society of America, 63(1), 187–194.

    Article  Google Scholar 

  • Vajda, S., & Santosh, K. (2016). A fast k-nearest neighbor classifier using unsupervised clustering. International conference on recent trends in image processing and pattern recognition (pp. 185–193). New York: Springer.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yosra Mzah.

Additional information

This work is a part of a European project entitled Intelligible City For All (http://www.icityforall.eu/) from the Active and Assisted Living Program (AAL 2011-4-056). The main objective of this project is to enhance the sense of security, self-confidence and comfort for elderly people suffering from presbycusis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mzah, Y., Chaoui, S. & Jaidane, M. Enhancing speech intelligibility in reverberant spaces by a speech features distributions dependent pre-processing. Int J Speech Technol 21, 773–781 (2018). https://doi.org/10.1007/s10772-018-9536-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9536-3

Keywords

Navigation