Computational auditory models in predicting noise reduction performance for wideband telephony applications

Pourmand, Nazanin; Parsa, Vijay; Weaver, Angela

doi:10.1007/s10772-013-9189-1

Computational auditory models in predicting noise reduction performance for wideband telephony applications

Published: 08 February 2013

Volume 16, pages 363–379, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Nazanin Pourmand¹,
Vijay Parsa¹ &
Angela Weaver¹

330 Accesses
2 Citations
Explore all metrics

Abstract

The performance of several noise reduction algorithms intended for wideband telephony was evaluated both subjectively and objectively. The chosen algorithms were based on statistical modeling, spectral subtraction, Wiener filtering, or subspace modelling principles. A customized wideband noise reduction database containing speech samples corrupted by three types of background noises at three SNR levels, along with their enhanced versions was created. The overall quality of the speech samples in the database was subsequently rated by a group of listeners with normal hearing capabilities. Comprehensive statistical analyses were performed to assess the reliability of the subjective data, and to assess the performance of noise reduction algorithms across varied noisy conditions. The subjective quality ratings were then used to investigate the performance of several auditory model-based objective quality metrics. Key results from these investigations include: (a) there was a high degree of inter- and intra-subject reliability in the subjective ratings, (b) noise reduction algorithms enhance speech quality for only a subset of the noise conditions, and (c) auditory model-based metrics perform similarly in predicting speech quality ratings, when speech quality scores pertaining to a particular noise condition were averaged.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A brief overview of speech enhancement with linear filtering

Article Open access 13 November 2014

Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech

Article Open access 23 July 2015

DNN-Based Calibrated-Filter Models for Speech Enhancement

Article 27 January 2021

Notes

LogMMSE and LogMMSE_SPU are from the same class of noise reduction algorithms and their implementations are also same with the difference that LogMMSE_SPU algorithm is based on the fact that speech may not be present at all time and there are some pause periods even during speech activity. This Speech Presence Uncertainty (SPU) is taken into account by involving a factor which shows the probability of the presence of the speech at a particular frequency (Loizou 2007).

References

ANSI S3.5 (1997). Methods for calculation of the speech intelligibility index. Washington: ANSI.
Google Scholar
Beaugeant, C., Schönle, M., & Varga, I. (2006). Challenges of 16 kHz in acoustic pre- and post-processing for terminals. IEEE Communications Magazine, 44(5), 98–104.
Article Google Scholar
Chen, G., & Parsa, V. (2007). Loudness pattern-based speech quality evaluation using Bayesian modeling and Markov chain Monte Carlo methods. The Journal of the Acoustical Society of America, 121(2), EL77-83.
Google Scholar
Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54(3), 477–490.
Article Google Scholar
Cox, R. V., Kroon, P., Chen, J. H., Thorkildsen, R., O’Dell, K. M., & Isenberg, D. S. (1995). Speech coders: from idea to product. AT&T Technical Journal, 74, 14–21.
Article Google Scholar
Dau, T., Püschel, D., & Kohlrausch, A. (1996). A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. The Journal of the Acoustical Society of America, 99(6), 3615–3622.
Article Google Scholar
Dau, T., Kollmeier, B., & Kohlrausch, A. (1997). Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. The Journal of the Acoustical Society of America, 102(5), 2892–2905.
Article Google Scholar
Egi, N., Aoki, H., & Takahashi, A. (2008). Objective quality evaluation method for noise-reduced speech. IEICE Transactions on Communications, E91-B(5), 1279–1286.
Article Google Scholar
Falk, T. H., & Chan, W. Y. (2008). A non-intrusive quality measure of dereverberated speech. In International workshop for acoustic echo and noise control.
Google Scholar
Garbin, C. (2013). Bivariate correlation comparisons. Retrieved from http://psych.unl.edu/psycrs/statpage/biv_corr_comp_eg.pdf.
Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138.
Article Google Scholar
Glasberg, B. R., & Moore, B. C. J. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5), 331–342.
Google Scholar
Gupta, M., Forrester, C., & Simmons, S. (2009). Review of wideband speech noise reduction techniques. Canadian Acoustic, 37(3), 84–85.
Google Scholar
Hansen, M., & Kollmeier, B. (2000). Objective modeling of speech quality with a psychoacoustically validated auditory model. Journal of the Audio Engineering Society, 48(5), 395–409.
Google Scholar
Helfenstein, M., & Moschytz, G. S. (2000). Circuits and systems for wireless communications (p. 404). Dordrecht: Kluwer Academic.
Google Scholar
Heute, U. (2008). Speech-transmission quality: aspects and assessment for wideband vs. narrowband signals. In Advances in digital speech transmission (pp. 9–50). New York: Wiley.
Google Scholar
Holube, I., & Kollmeier, B. (1996). Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. The Journal of the Acoustical Society of America, 100(3), 1703–1716.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Article Google Scholar
Huber, R., & Kollmeier, B. (2006). PEMO-Q—a new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1902–1911.
Article Google Scholar
ITU-T P. 835 (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T.
ITU-T Rec. P. 563 (2004). Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T.
ITU-T Rec. P. 862 (2001). Perceptual evaluation of speech quality (PESQ). ITU-T.
ITU-T Rec. P. 862.2 (2007). Wideband extension to Recommendation P. 862 for the assessment of wideband telephone networks and speech codecs. ITU-T.
Jelinek, M., & Salami, R. (2004). Noise reduction method for wideband speech coding. In Proc EUSIPCO, Vienna, Austria (pp. 1959–1962).
Google Scholar
Jepsen, M. L., Ewert, S. D., & Dau, T. (2008). A computational model of human auditory signal processing and perception. The Journal of the Acoustical Society of America, 124(1), 422–438.
Article Google Scholar
Kabal, P. (2002). TSP speech database. Retrieved from http://www-mmsp.ece.mcgill.ca/Documents/Data/index.html.
Kamath, S.D. (2001). A multi-band spectral subtraction method for speech enhancement (Master’s thesis). Dallas: University of Texas.
Google Scholar
Kates, J. M., & Arehart, K. H. (2010). The hearing-aid speech quality index (HASQI). Journal of the Audio Engineering Society, 58(5), 363–381.
Google Scholar
Kim, D. (2005). ANIQUE: an auditory model for single-ended speech quality estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 821–831.
Article Google Scholar
Kondo, K. (2012). Subjective quality measurement of speech: its evaluation, estimation and applications (p. 153). Berlin: Springer.
Book Google Scholar
Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2011). Robustness of the hearing aid speech quality index (HASQI). In Workshop on applications of signal processing to audio and acoustics.
Google Scholar
Laska, B., Bolic, M., & Goubran, R. (2010). Discrete cosine transform particle filter speech enhancement. Speech Communication, 52, 762–775.
Article Google Scholar
Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.
Google Scholar
Matsunaga, M. (2007). Familywise error in multiple comparisons: disentangling a knot through a critique of O’keefe’s arguments against alpha adjustment. Communication Methods and Measures, 1(4), 243–265.
Article Google Scholar
Moore, B. C. J., & Glasberg, B. R. (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188, 70–88.
Article Google Scholar
Moore, B. C. J., & Tan, C.-T. (2003). Perceived naturalness of spectrally distorted speech and music. The Journal of the Acoustical Society of America, 114, 408–419.
Article Google Scholar
Moore, B. C. J., & Tan, C. T. (2004). Development and validation of a method for predicting the perceived naturalness of sounds subjected to spectral distortion. Journal of the Audio Engineering Society, 52(9), 900–914.
Google Scholar
Möller, S., Chan, W., Côté, N., Falk, T. H., Raake, A., & Wältermann, M. (2011). Speech quality estimation: models and trends. IEEE Signal Processing Magazine, 28(6), 18–28.
Article Google Scholar
Quackenbush, S. R., Barnwell, T. P., & Clements, M. A. (1988). Objective measures of speech quality. New York: Prentice Hall.
Google Scholar
Ricketts, T. A., Dittberner, A. B., & Johnson, E. E. (2008). High frequency amplification and sound quality in listeners with normal through moderate hearing loss. Journal of Speech, Language, and Hearing Research, 51, 160–172.
Article Google Scholar
Rohdenburg, T., Hohmann, V., & Kollmeier, B. (2005). Objective perceptual quality measures for the evaluation of noise reduction schemes. In 9th international workshop on acoustic echo and noise control (pp. 169–172).
Google Scholar
Salmela, J., & Mattila, V. (2004). New intrusive method for the objective quality evaluation of acoustic noise suppression in mobile communications. In Proc. 116th audio eng. soc. conv.
Google Scholar
Scalart, P., & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. In International conference on acoustics, speech, and signal processing (Vol. 2, pp. 629–632).
Google Scholar
Sohn, J., Kim, N.S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.
Article Google Scholar
Stelmachowicz, P., Pittman, A., Hoover, B., & Lewis, D. (2001). Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. The Journal of the Acoustical Society of America, 110(4), 2183–2190.
Article Google Scholar
Stoll, G., & Kozamernlk, F. (2000). EBU listening tests on Internet audio codecs (EBU Technical Review).
Tchorz, J., & Kollmeier, B. (1999). A model of auditory perception as front end for automatic speech recognition. The Journal of the Acoustical Society of America, 106(4), 2040–2050.
Article Google Scholar
Varga, I., Iacovo, R. D. De, & Usai, P. (2006). Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Communications Magazine, May, 66–73.
Article Google Scholar
Voran, S. (1997). Listener ratings of speech passbands. In Speech coding for telecommunications proceeding (pp. 81–82).
Google Scholar

Download references

Acknowledgements

This work was supported by Ontario Research Fund and Research In Motion Company. The authors would like to thank Chris Forrester, Malay Gupta and Nikolai Kouznetsov for their helpful and supportive comments.

Author information

Authors and Affiliations

National Centre for Audiology & Dept. of Electrical and Computer Engineering, University of Western Ontario, London, ON, N6G 1H1, Canada
Nazanin Pourmand, Vijay Parsa & Angela Weaver

Authors

Nazanin Pourmand
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Parsa
View author publications
You can also search for this author in PubMed Google Scholar
Angela Weaver
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nazanin Pourmand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pourmand, N., Parsa, V. & Weaver, A. Computational auditory models in predicting noise reduction performance for wideband telephony applications. Int J Speech Technol 16, 363–379 (2013). https://doi.org/10.1007/s10772-013-9189-1

Download citation

Received: 20 October 2012
Accepted: 28 January 2013
Published: 08 February 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10772-013-9189-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computational auditory models in predicting noise reduction performance for wideband telephony applications

Abstract

Access this article

Similar content being viewed by others

A brief overview of speech enhancement with linear filtering

Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech

DNN-Based Calibrated-Filter Models for Speech Enhancement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computational auditory models in predicting noise reduction performance for wideband telephony applications

Abstract

Access this article

Similar content being viewed by others

A brief overview of speech enhancement with linear filtering

Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech

DNN-Based Calibrated-Filter Models for Speech Enhancement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation