Skip to main content
Log in

Computational auditory models in predicting noise reduction performance for wideband telephony applications

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The performance of several noise reduction algorithms intended for wideband telephony was evaluated both subjectively and objectively. The chosen algorithms were based on statistical modeling, spectral subtraction, Wiener filtering, or subspace modelling principles. A customized wideband noise reduction database containing speech samples corrupted by three types of background noises at three SNR levels, along with their enhanced versions was created. The overall quality of the speech samples in the database was subsequently rated by a group of listeners with normal hearing capabilities. Comprehensive statistical analyses were performed to assess the reliability of the subjective data, and to assess the performance of noise reduction algorithms across varied noisy conditions. The subjective quality ratings were then used to investigate the performance of several auditory model-based objective quality metrics. Key results from these investigations include: (a) there was a high degree of inter- and intra-subject reliability in the subjective ratings, (b) noise reduction algorithms enhance speech quality for only a subset of the noise conditions, and (c) auditory model-based metrics perform similarly in predicting speech quality ratings, when speech quality scores pertaining to a particular noise condition were averaged.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. LogMMSE and LogMMSE_SPU are from the same class of noise reduction algorithms and their implementations are also same with the difference that LogMMSE_SPU algorithm is based on the fact that speech may not be present at all time and there are some pause periods even during speech activity. This Speech Presence Uncertainty (SPU) is taken into account by involving a factor which shows the probability of the presence of the speech at a particular frequency (Loizou 2007).

References

  • ANSI S3.5 (1997). Methods for calculation of the speech intelligibility index. Washington: ANSI.

    Google Scholar 

  • Beaugeant, C., Schönle, M., & Varga, I. (2006). Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Communications Magazine, 44(5), 98–104.

    Article  Google Scholar 

  • Chen, G., & Parsa, V. (2007). Loudness pattern-based speech quality evaluation using Bayesian modeling and Markov chain Monte Carlo methods. The Journal of the Acoustical Society of America, 121(2), EL77-83.

    Google Scholar 

  • Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54(3), 477–490.

    Article  Google Scholar 

  • Cox, R. V., Kroon, P., Chen, J. H., Thorkildsen, R., O’Dell, K. M., & Isenberg, D. S. (1995). Speech coders: from idea to product. AT&T Technical Journal, 74, 14–21.

    Article  Google Scholar 

  • Dau, T., Püschel, D., & Kohlrausch, A. (1996). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99(6), 3615–3622.

    Article  Google Scholar 

  • Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America, 102(5), 2892–2905.

    Article  Google Scholar 

  • Egi, N., Aoki, H., & Takahashi, A. (2008). Objective quality evaluation method for noise-reduced speech. IEICE Transactions on Communications, E91-B(5), 1279–1286.

    Article  Google Scholar 

  • Falk, T. H., & Chan, W. Y. (2008). A non-intrusive quality measure of dereverberated speech. In International workshop for acoustic echo and noise control.

    Google Scholar 

  • Garbin, C. (2013). Bivariate correlation comparisons. Retrieved from http://psych.unl.edu/psycrs/statpage/biv_corr_comp_eg.pdf.

  • Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138.

    Article  Google Scholar 

  • Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5), 331–342.

    Google Scholar 

  • Gupta, M., Forrester, C., & Simmons, S. (2009). Review of wideband speech noise reduction techniques. Canadian Acoustic, 37(3), 84–85.

    Google Scholar 

  • Hansen, M., & Kollmeier, B. (2000). Objective modeling of speech quality with a psychoacoustically validated auditory model. Journal of the Audio Engineering Society, 48(5), 395–409.

    Google Scholar 

  • Helfenstein, M., & Moschytz, G. S. (2000). Circuits and systems for wireless communications (p. 404). Dordrecht: Kluwer Academic.

    Google Scholar 

  • Heute, U. (2008). Speech-transmission quality: aspects and assessment for wideband vs. narrowband signals. In Advances in digital speech transmission (pp. 9–50). New York: Wiley.

    Google Scholar 

  • Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703–1716.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601.

    Article  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  • Huber, R., & Kollmeier, B. (2006). PEMO-Q—a new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1902–1911.

    Article  Google Scholar 

  • ITU-T P. 835 (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T.

  • ITU-T Rec. P. 563 (2004). Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T.

  • ITU-T Rec. P. 862 (2001). Perceptual evaluation of speech quality (PESQ). ITU-T.

  • ITU-T Rec. P. 862.2 (2007). Wideband extension to Recommendation P. 862 for the assessment of wideband telephone networks and speech codecs. ITU-T.

  • Jelinek, M., & Salami, R. (2004). Noise reduction method for wideband speech coding. In Proc EUSIPCO, Vienna, Austria (pp. 1959–1962).

    Google Scholar 

  • Jepsen, M. L., Ewert, S. D., & Dau, T. (2008). A computational model of human auditory signal processing and perception. The Journal of the Acoustical Society of America, 124(1), 422–438.

    Article  Google Scholar 

  • Kabal, P. (2002). TSP speech database. Retrieved from http://www-mmsp.ece.mcgill.ca/Documents/Data/index.html.

  • Kamath, S.D. (2001). A multi-band spectral subtraction method for speech enhancement (Master’s thesis). Dallas: University of Texas.

    Google Scholar 

  • Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381.

    Google Scholar 

  • Kim, D. (2005). ANIQUE: an auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 821–831.

    Article  Google Scholar 

  • Kondo, K. (2012). Subjective quality measurement of speech: its evaluation, estimation and applications (p. 153). Berlin: Springer.

    Book  Google Scholar 

  • Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2011). Robustness of the hearing aid speech quality index (HASQI). In Workshop on applications of signal processing to audio and acoustics.

    Google Scholar 

  • Laska, B., Bolic, M., & Goubran, R. (2010). Discrete cosine transform particle filter speech enhancement. Speech Communication, 52, 762–775.

    Article  Google Scholar 

  • Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.

    Google Scholar 

  • Matsunaga, M. (2007). Familywise error in multiple comparisons: disentangling a knot through a critique of O’keefe’s arguments against alpha adjustment. Communication Methods and Measures, 1(4), 243–265.

    Article  Google Scholar 

  • Moore, B. C. J., & Glasberg, B. R. (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188, 70–88.

    Article  Google Scholar 

  • Moore, B. C. J., & Tan, C.-T. (2003). Perceived naturalness of spectrally distorted speech and music. The Journal of the Acoustical Society of America, 114, 408–419.

    Article  Google Scholar 

  • Moore, B. C. J., & Tan, C. T. (2004). Development and validation of a method for predicting the perceived naturalness of sounds subjected to spectral distortion. Journal of the Audio Engineering Society, 52(9), 900–914.

    Google Scholar 

  • Möller, S., Chan, W., Côté, N., Falk, T. H., Raake, A., & Wältermann, M. (2011). Speech quality estimation: models and trends. IEEE Signal Processing Magazine, 28(6), 18–28.

    Article  Google Scholar 

  • Quackenbush, S. R., Barnwell, T. P., & Clements, M. A. (1988). Objective measures of speech quality. New York: Prentice Hall.

    Google Scholar 

  • Ricketts, T. A., Dittberner, A. B., & Johnson, E. E. (2008). High frequency amplification and sound quality in listeners with normal through moderate hearing loss. Journal of Speech, Language, and Hearing Research, 51, 160–172.

    Article  Google Scholar 

  • Rohdenburg, T., Hohmann, V., & Kollmeier, B. (2005). Objective perceptual quality measures for the evaluation of noise reduction schemes. In 9th international workshop on acoustic echo and noise control (pp. 169–172).

    Google Scholar 

  • Salmela, J., & Mattila, V. (2004). New intrusive method for the objective quality evaluation of acoustic noise suppression in mobile communications. In Proc. 116th audio eng. soc. conv.

    Google Scholar 

  • Scalart, P., & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. In International conference on acoustics, speech, and signal processing (Vol. 2, pp. 629–632).

    Google Scholar 

  • Sohn, J., Kim, N.S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.

    Article  Google Scholar 

  • Stelmachowicz, P., Pittman, A., Hoover, B., & Lewis, D. (2001). Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. The Journal of the Acoustical Society of America, 110(4), 2183–2190.

    Article  Google Scholar 

  • Stoll, G., & Kozamernlk, F. (2000). EBU listening tests on Internet audio codecs (EBU Technical Review).

  • Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050.

    Article  Google Scholar 

  • Varga, I., Iacovo, R. D. De, & Usai, P. (2006). Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Communications Magazine, May, 66–73.

    Article  Google Scholar 

  • Voran, S. (1997). Listener ratings of speech passbands. In Speech coding for telecommunications proceeding (pp. 81–82).

    Google Scholar 

Download references

Acknowledgements

This work was supported by Ontario Research Fund and Research In Motion Company. The authors would like to thank Chris Forrester, Malay Gupta and Nikolai Kouznetsov for their helpful and supportive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nazanin Pourmand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pourmand, N., Parsa, V. & Weaver, A. Computational auditory models in predicting noise reduction performance for wideband telephony applications. Int J Speech Technol 16, 363–379 (2013). https://doi.org/10.1007/s10772-013-9189-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9189-1

Keywords

Navigation