Abstract
It is important to know the degree to which convolutive noise disrupts the perceptual aspects of speech and its intelligibility. This paper presents the ideal binary masking criterion for reducing the convolutive noise (reverberation) and to improve the quality and intelligibility of speech. The noise is suppressed using ideal binary time–frequency masking that is based on signal-to-reverberation ratio (SRR) of individual time–frequency channels. All T–F channels with the SRR greater than pre-selected threshold are retained while others are eliminated. The performance of algorithm is evaluated using IEEE sentences corrupted with different degrees of reverberation times (RT60) ranging from 0.3 to 2.0 s. The results indicate that with the increase of reverberation time, the intelligibility and perceptual aspects of speech decrease. Additional analyses indicated that ideal binary masking reduced the temporary envelope spreading effect introduced by the reverberation. The algorithm is evaluated with perceptual evaluation of speech quality, SNRLOSS, log-likelihood-ratio and frequency weighted segmental signal-to-noise ratio.
Similar content being viewed by others
References
Assmann, P. F., & Summerfield, Q. (2004). The perception of speech under adverse acoustic conditions. In S. Greenberg (Ed.), Speech processing in auditory system. A. N: W. A. Ainsworth.
Bolt, R. H., & MacDonald, A. D. (1949). Theory of speech masking by reverberation. Journal of the Acoustic Society of America, 21, 577–580.
Furuya, K., & Kataoka, A. (2007). Robust speech dereverberation using multichannel blind deconvolution with spectral subtraction. IEEE Transactions on Audio, Speech, and Language Processing, 15, 1579–1591.
Grundlehner, B., Lecocq, J., Balan, R., & Rosca, J. (2005). Performance assessment method for speech enhancement. In Proceedings of 1st annual, IEEE.
Haykin, S. (2000). Unsupervised adaptive filtering: Blind de-convolution (Vol. 2, pp. 1–12). New York: Wiley.
Huang, Y., Benesty, J., & Chen, J. (2007). De-reverberation. In J. Benesty, M. Sondhi, & Y. Huang (Eds.), Springer handbook of speech processing (pp. 929–943). New York: Springer.
Kjellberg, A. (2004). Effects of reverberation time on the cognitive load in speech communication: Theoretical considerations. Noise Health, 7, 11–21.
Kokkinakis, K., & Loizou, P. C. (2009). Selective-tap blind de-reverberation for two-microphone enhancement of reverberant speech. IEEE Signal Processing Letters, 16, 961–964.
Krishnamoorthy, P., & Prasanna, S. R. (2009). Reverberant speech enhancement by temporal and spectral processing. IEEE Transactions on Audio, Speech, and Language Processing, 17, 253–266.
Loizou, P. C. (2007). Speech enhancement: Theory and practice. In S. R. Quackenbush, T. P. Barnwell III, & M. A. Clement (Eds.), Objective—measures of speech quality (2nd ed.). Eaglewood Cliffs: Prentice Hall.
Ma, J., & Loizou, P. C. (2011). SNR loss: A new objective measure for predicting speech intelligibility of noise-suppressed speech. Speech Communication, 53(3), 340–354.
Miyoshi, M., & Kaneda, Y. (1988). Inverse filtering of room acoustics. IEEE Transactions on Speech and Audio Processing, 36, 145–152.
Nabelek, A. K., & Dagenais, P. A. (1986). Vowel errors in noise and in reverberation by hearing-impaired listeners. Journal of the Acoustic Society of America, 80, 741–748.
Nabelek, A. K., & Letowski, T. R. (1988). Similarities of vowels in non-reverberant and reverberant fields. Journal of the Acoustic Society of America, 83, 1891–1899.
Nabelek, A. K., Letowski, T. R., & Tucker, F. M. (1989). Reverberant overlap and self-masking in consonant identification. Journal of the Acoustic Society of America, 86, 1259–1265.
Nabelek, A. K., & Picket, J. M. (1974). Monaural and binaural speech perception through hearing aids under noise and reverberation with normal and hearing-impaired listeners. Journal of Speech and Hearing Research, 17, 724–739.
Neuman, A. C., Wroblewski, M., Hajicek, J., & Rubinstein, A. (2010). Combined effects of noise and reverberation on speech recognition performance of normal-hearing children and adults. Ear and Hearing, 31, 336–344.
Rix, A.W., Hollier, M. P., Hekstra, A. P. & Beerends, J. G. (2001). Perceptual evaluation of speech quality (PESQ).
Roman, N., & Woodruff, J. (2013). Speech intelligibility in reverberation with ideal binary masking: Effects of early reflections and signal-to-noise ratio threshold. Journal of the Acoustical Society of America, 133, 1707–1717.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saleem, N., Mustafa, E., Nawaz, A. et al. Ideal binary masking for reducing convolutive noise. Int J Speech Technol 18, 547–554 (2015). https://doi.org/10.1007/s10772-015-9298-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9298-0