Abstract
In this paper, we deal with a pre-processing based on speech envelope modulation for intelligibility enhancement in reverberant large dimension public enclosed spaces. In fact, the blurring effect due to reverberation alters the speech perception in such conditions. This phenomenon results from the masking of consonants by the reverberated tails of the previous vowels. This is particularly accentuated for elderly persons suffering from presbycusis. The proposed pre-processing is inspired from the steady-state suppression technique which consists in the detection of the steady-state portions of speech and the multiplication of their waveforms with an attenuation coefficient in order to decrease their masking effect. While the steady-state suppression technique is performed in the frequency domain, the pre-processing described in this paper is rather performed in the temporal domain. Its key novelty consists in the detection of the speech voiced segments using a priori knowledge about the distributions of the powers and the durations of voiced and unvoiced phonemes. The performances of this pre-processing are evaluated with an objective criterion and with subjective listening tests involving normal hearing persons and using a set of nonsense Vowel–Consonant–Vowel syllables and railway station vocal announcements.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arai, T., Hodoshima, N., & Yasu, K. (2010). Using steady-state suppression to improve speech intelligibility in reverberant environments for elderly listeners. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1775–1780.
Arai, T., Kinoshita, K., Hodoshima, N., Kusumoto, A., & Kitamura, T. (2002). Effects of suppressing steady-state portions of speech on intelligibility in reverberant environments. Acoustical Science and Technology, 23(4), 229–232.
Arai, T., Murakami, Y., Hayashi, N., Hodoshima, N., & Kurisu, K. (2007). Inverse correlation of intelligibility of speech in reverberation with the amount of overlap-masking. Acoustical Science and Technology, 28(6), 438–441.
Assmann, P., & Summerfield, Q. (2004). The perception of speech under adverse conditions. In S. Greenberg, W. A. Ainsworth, A. N. Popper, & R. R. Fay (Eds.), Speech processing in the auditory system (pp. 231–308). Berlin: Springer.
Bolt, R., & MacDonald, A. (1949). Theory of speech masking by reverberation. The Journal of the Acoustical Society of America, 21(6), 577–580.
Bouguelia, M. R., Nowaczyk, S., Santosh, K., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics. https://doi.org/10.1007/s13042-017-0645-0.
Duquesnoy, A., & Plomp, R. (1980). Effect of reverberation and noise on the intelligibility of sentences in cases of presbyacusis. The Journal of the Acoustical Society of America, 68(2), 537–544.
Flanagan, J., Berkley, D., Elko, G., West, J., & Sondhi, M. (1991). Autodirective microphone systems. Acta Acustica United with Acustica, 73(2), 58–71.
Furui, S. (1986). On the role of spectral transition for speech perception. The Journal of the Acoustical Society of America, 80(4), 1016–1025.
Habets, E. A. (2005). Multi-channel speech dereverberation based on a statistical model of late reverberation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, 2005 (ICASSP’05) (Vol. 4, pp. iv–173). IEEE.
Habets, E. A. (2010). Speech dereverberation using statistical reverberation models. In P. A. Naylor & D. G. Gaubitch (Eds.), Speech dereverberation (pp. 57–93). New York: Springer.
Halling, D. C., & Humes, L. E. (2000). Factors affecting the recognition of reverberant speech by elderly listeners. Journal of Speech, Language, and Hearing Research, 43(2), 414–431.
Helfer, K. S., & Huntley, R. A. (1991). Aging and consonant errors in reverberation and noise. The Journal of the Acoustical Society of America, 90(4), 1786–1796.
Hodoshima, N., Miyauchi, Y., Yasu, K., & Arai, T. (2007). Steady-state suppression for improving syllable identification in reverberant environments: A case study in an elderly person. Acoustical Science and Technology, 28(1), 53–55.
Humes, L. E., & Dubno, J. R. (2010). Factors affecting speech understanding in older adults. In S. Gordon-Salant, R. D. Frisina, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 211–257). New York: Springer.
Kodrasi, I., & Doclo, S. (2016). Joint dereverberation and noise reduction based on acoustic multi-channel equalization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(4), 680–693.
Langhans, T., & Strube, H. (1982). Speech enhancement by nonlinear multiband envelope filtering. In IEEE international conference on acoustics, speech, and signal processing, ICASSP’82 (Vol. 7, pp. 156–159). IEEE.
Mechergui, N., Djaziri-Larbi, S., & Jaïdane, M. (2017). Speech based transmission index for all: An intelligibility metric for variable hearing ability. The Journal of the Acoustical Society of America, 141(3), 1470–1480.
Miyoshi, M., & Kaneda, Y. (1988). Inverse filtering of room acoustics. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(2), 145–152.
Mzah, Y., Ahfir, M., & Jaidane, M. (2016). Late pre-dereverberation for speech intelligibility enhancement in public address systems. In International symposium on signal, image, video and communications (ISIVC) (pp. 291–296). IEEE.
Nabelek, A. K., & Robinette, L. (1978). Influence of the precedence effect on word identification by normally hearing and hearing-impaired subjects. The Journal of the Acoustical Society of America, 63(1), 187–194.
Vajda, S., & Santosh, K. (2016). A fast k-nearest neighbor classifier using unsupervised clustering. International conference on recent trends in image processing and pattern recognition (pp. 185–193). New York: Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is a part of a European project entitled Intelligible City For All (http://www.icityforall.eu/) from the Active and Assisted Living Program (AAL 2011-4-056). The main objective of this project is to enhance the sense of security, self-confidence and comfort for elderly people suffering from presbycusis.
Rights and permissions
About this article
Cite this article
Mzah, Y., Chaoui, S. & Jaidane, M. Enhancing speech intelligibility in reverberant spaces by a speech features distributions dependent pre-processing. Int J Speech Technol 21, 773–781 (2018). https://doi.org/10.1007/s10772-018-9536-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-9536-3