Abstract
This paper presents non-linear adaptive speech enhancement schemes inspired by features of early auditory processing. A generic multi-microphone sub-band adaptive (MMSBA) framework is described which allows for the manipulation of several factors that may influence the intelligibility and perceived quality of the processed speech. The proposed framework supports inclusion of: non-linear distribution of sub-bands (as in humans), cross-band effects such as lateral inhibition, and robust adaptive metrics for selecting an appropriate coherent or incoherent noise canceller for each sub-band, based on identified features of the band-limited signals from multiple-sensors during silence periods. An efficient higher order statistics (HOS) based speech/non-speech detector is proposed for enabling effective adaptive control of MMSBA filtering against the environment. New hybrid extensions of the MMSBA scheme incorporating neural networks and post-Weiner filtering are also described and their comparative performance assessed in real reverberant environments. Finally, some future research directions for MMSBA based speech enhancement are proposed including possible alternative strategies based on stochastic resonance.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Feng, A.S.: Information processing in the auditory brainstem. Current Opinion in Neurobiology 2, 511–515 (1992)
Plomp, R.: Auditory handicap of hearing impairment & limited benefit of hearing aids. J. Acoust. Soc. Am (JASA) 63, 533–549 (1978)
Gustaffson, H.A., Arlinger, S.D.: Masking of speech by amplitude modulated noise. J. Acoust. Soc. Am. 95, 518–529 (1994)
Darwin, C.J., McKeown, J.D., Kirby, D.: Compensation for transmission channel & speaker effects on vowel quality. Speech Comm. 8, 221–234 (1989)
Campbell, D.R.: Binaural Processing for Hearing Aids. In: Ainsworth, W., Greenberg, S. (eds.) Proceedings of Workshop on Auditory Basis of Speech Perception, Keele University, UK, July 15-19, pp. 253–256 (1996)
Glasberg, B.R., Moore, B.C.J.: Psychoacoustical abilities of subjects with unilateral and bilateral cochlear hearing impairments and their relationship to the ability to understand speech. Scand. Audio. Suppl. 32, 1–25 (1989)
Wightman, F.L., Kistler, D.J.: The dominant role of low-frequency interaural time differences in sound localization. J. Acoust. Soc. Am. 91, 1648–1661 (1992)
Carhart, R., Tillman, T.W., Johnson, K.R.: Effects of interaural time delays on masking by two competing signals. J. Acoust. Soc. Am (JASA) 43, 1223–1230 (1968)
Baer, T., Moore, B.C.J., Gatehouse, S.: Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality and response times. J. Rehab. Res. Dev. 30, 49–72 (1993)
Bernstein, L.R., Trahiotis, C.: Discrimination of interaural envelope correlation and its relation to binaural unmasking at high frequencies. J. Acoust. Soc. Am (JASA) 91, 306–316 (1992)
Culling, J.F., Summerfield, Q.: Perceptual separation of concurrent speech sounds: Absence of cross frequency grouping by common interaural delay. J. Acoust. Soc. Am (JASA) 98, 785–797 (1995)
Durlach, N.: Binaural signal detection: Equalization & cancellation theory. In: Tobias, J.V. (ed.) Foundations of Modern Auditory Theory, vol. II, Academic Press, London (1972)
Ghitza, O.: Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Trans. Speech & Audio Proc. 2, 115–132 (1994)
Cheng, Y.M., O’Shaughnessy, D.: Speech-enhancement based conceptually on auditory evidence. IEEE Trans. Sig. Proc. 39, 1943–1954 (1991)
Hermansky, H., Tibrewala, S.: Sub-band Based Recognition of Noisy Speech. In: Proc. ICASSP, Munich, April 20-24, pp. 1255–1258 (1997)
Bourlard, H., Dupont, S.: Subband-based speech recognition. In: Proc. ICASSP, Munich, April 20-24, pp. 1255–1258 (1997)
Smith, L.S.: Biologically inspired robust onset detection. J. Acoust. Soc. America 113 (2003)
Toner, E.: Speech Enhancement using Digital Signal Processing. PhD thesis. University of Paisley, UK (1993)
Toner, E., Campbell, D.R.: Speech Enhancement using sub-band intermittent adaption. Speech Communication 12, 253–259 (1993)
Goulding, M.M., Bird, J.S.: Speech enhancement for mobile telephony. IEEE Trans. on Vehicular Technology 39(4), 316–326 (1990)
Hussain, A., Campbell, D.R.: Intelligibility improvements using binaural diverse sub-band processing applied to speech corrupted with automobile noise. IEE Proceedings: Vision, Image & Signal Processing 148, 127–132 (2001)
Wallace, R.B., Goubran, R.A.: Improved tracking adaptive noise canceller for nonstationary environments. IEEE Trans. on Sig. Proc. 40, 700–703 (1992)
Elberling, C., Ludvigsen, C., Keidser, G.: Design &testing of a noise reduction algorithm based on spectral subtraction. Scand. Audiol., Suppl. 38, 39–48 (1993)
Kollmeier, B., Peissig, J., Hohmann, V.: Binaural noise-reduction hearing aid scheme with real-time processing in the frequency domain. Scand. Audiol., Suppl. 38, 28–38 (1993)
Moore, B.C.J., Peters, R.W., Stone, M.A.: Benefits of linear amplification and multichannel compression for speech comprehension in backgrounds with spectral and temporal dips. J. Acoust. Soc. Am. 105, 400–411 (1999)
Soede, W., Bilsen, F.A., Berkhout, A.J.: Assessment of a directional microphone array for hearing impaired listeners. J. Acous. Soc. Am. 94, 799–808 (1993)
Elberling, C., Ludvigsen, C., Keidser, G.: Design &testing of a noise reduction algorithm based on spectral subtraction. Scand. Audiol. Suppl. 38, 39–49 (1993)
Le Bouquin, R., Azirani, A.A., Faucon, G.: Enhancement of speech degraded by coherent and incoherent noise using a cross-spectral estimator. IEEE Trans. Speech & Audio Proc. 5, 484–487 (1997)
Abutalebi, H.R., Sheikhzadeh, H., Brennan, R.L., Freeman, G.H.: A hybrid sub-band system for speech enhancement in diffused noise fields. IEEE Sig. Process. Letters (2003)
Dabis, H.S., Moir, T.J., Campbell, D.R.: Speech enhancement by recursive estimation of differential transfer functions. In: Proceedings of ICSP, Beijing, pp. 345–348 (1990)
Hussain, A.: A Multi-microphone Sub-band Adaptive Speech Enhancement System employing diverse sub-band processing. International Journal of Robotics & Automation 15, 78–84 (2000)
Shields, P., Campbell, D.R.: Improvements in intelligibility of noisy reverberant speech using a binaural sub-band adaptive noise-cancellation processing scheme. J. Acous. Soc. Am. 110, 3232–3242 (2001)
Hussain, A.: Multi-sensor Neural Network processing of Noisy Speech. International Journal of Neural Systems 9, 467–472 (1999)
Hussain, A.: Non-linear Speech Processing using Neural Networks based Adaptive Filtering. In: Proc. 4th IEEE INMIC, Islamabad, September 10-11 (2000)
Soraghan, J., Hussain, A., Alkulaibi, A., Durrani, T.S.: Higher Order Statistics based nonlinear speech analysis. Journal of Control and Intelligent Systems 30, 11–18 (2002)
Greenwood, D.D.: A cochlear frequency-position function for several species-29 years later. J. Acoustic Soc. Amer. 86, 2592–2605 (1990)
Vaseghi, S.V.: Advanced signal processing and digital noise reduction. John Wiley & Sons, Chichester (2000)
Ferrara, E.R., Widrow, B.: Multi-channel Adaptive Filtering for signal enhancement. IEEE Trans. on Acoustics, Speech and Signal Proc. 29, 766–770 (1981)
Le Bouquin, R., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)
Yoma, N.B., McInnes, F., Jack, M.: Lateral inhibition Net and Weighted Matching Algorithms for speech recognition in noise. Proc. IEE Vision, Image & Signal Processing 143, 324–330 (1996)
Bahoura, M., Rouat, J.: A new approach for wavelet speech enhancement. In: Proc. EUROSPEECH, pp. 1937–(2001)
Bahoura, M., Rouat, J.: Wavelet speech enhancement based on the Teager Energy Operator. IEEE Signal Proc. Lett. 8, 10–12 (2001)
Nikias, C., Raghuvers, M.: Bispectrum estimation: A digital signal procession framework. Proc. IEEE. 75, 869–891 (1987)
Lynch, M.R., Holden, S.B., Rayner, P.J.W.: Complexity Reduction in Volterra Connectionist Networks using a Self-Structuring LMS Algorithm. In: Proc. IEE Second Intern. Conf. Artificial Neural Networks, pp. 44–48 (1991)
Gammaitoni, L., Hanggi, Jung, P., Marchesoni, P.: Stochastic resonance. Review Modern Physics 70, 223–287 (1998)
Petracchi, D., Gebeshuber, I.C., DeFelice, L.J., Holden, A.V.: Stochastic resonance in biologocal systems. Chaos, Solutions and Fractals 11, 1819–1822 (2000)
Douglas, J.K., Wilkens, L., Pantazelou, E., Moss, F.: Noise enhancement of information transfer in crayfish mechanoreceptor by stochastic resonance. Nature 365, 337–340 (1993)
Fauve, F.: Stochastic resonance in a bistable system. Phys. Lett. 97A, 5–7 (1983)
Weisenfeld, M.F.: Stochastic resonance and the benefits of noise: from ice ages to the crayfish and SQUIDs. Nature 373, 33–36 (1995)
Douglas, K., Wilkens, L., Pantazelou, E., Moss, F.: Noise enhancement of information transfer in crayfish mechanoreceptor by stochastic resonance. Nature 365, 337–340 (1995)
Anderson, J.S., Lampl, I., Gillespie, D.C., Ferster, D.: The contribution of noise to contrast invariance of orientation tuning in Cat visual cortex. Science 290, 1968–1972 (2000)
Levin, J.E., Miller, J.P.: Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature 380, 165–168 (1996)
Usher, M., Feingold, M.: Stochastic resonance in the speed of memory retrieval. Biological Cybernetics 83, L11-L16 (2000)
Mori, T., Kai, S.: Noise-induced entrainment and stochastic resonance in human brain waves. Phys. Rev. Lett. 88, 1–4 (2002)
Hohn, N., Burkitt, A.N.: Modelling the neural response to speech: stochastic resonance and coding of vowel-like stimuli. In: IEEE EMBS Conference, Monash University (2001)
Luchinsky, D.G., Mannella, R., McClintock, P.V.E., Stocks, N.G.: Stochastic resonance in electrical circuits II. Nonconventional stochastic resonance. IEEE Trans. Circuits and Systems 46, 1215–1224 (1999)
Stocks, N.G.: Information transmission in parallel arrays of threshold elements: suprathreshold stochastic resonance. Phy. Rev. E. 63, 1–9 (2001)
Stocks, N.G., Allingham, G., Morse, R.P.: The application of suprathreshold stochastic resonance to cochlear implant coding. J. Fluctuation and noise letters 2, 169–181 (2002)
Gammaitoni, L.: Stochastic resonance and the dithering effect in threshold physical systems. Physical Review E 52, 4691–4698 (1995)
Longtin, A., Bulsara, A., Moss, F.: Time-interval sequences in bistable systems and noiseinduced transmission of information by sensory neurons. Phys. Rev Lett. 67, 656–659 (1991)
Collins, J.J., Chow, C.C., Capela, A.C., Imhoff, T.T.: Aperiodic stochastic resonance. Phys. Rev. E. 54, 5575–5584 (1996)
Stemmler, M.: A Single Spike Suffices: the simplest form of stochastic resonance in model neurons. Network: Computation in Neural Systems 7, 687–716 (1996)
Benzi, R., Sutera, A., Vulpiiani, A.: The mechanism of stochastic resonance. J. Phys. A 14, 453–457 (1981)
Nicolis, C., Nicolis, G.: Stochastic aspects of climatic transitions - response to periodic forcing. Tellus 34, 1–9 (1982)
Benzi, R., Parisi, G., Sutera, A., Vulpiani, A.: Stochastic resonance in climatic changes. Tellus 34, 10–16 (1982)
McNamara, B., Wiesenfeld, K., Roy, R.: Observation of stochastic resonance in a ring laser. Phys. Rev. Lett. 60, 2626–2629 (2002)
Gluckman, B.J., Netoff, T.I., Neel, E.J., Dittoand, W.L., Spano, M.L., Schiff, S.J.: Stochastic resonance in a neuronal network from a mammalian brain. Physical Review Letters 77, 4098–4101 (1996)
Morse, R.P., Evans, E.F.: Enhancement of vowel coding for cochlear implants by addition of noise. Nature Medicine 2, 928–932 (1996)
Mtetwa, N., Smith, L.S.: Precision constrained stochastic resonance in a feed forward neural network. IEEE Transactions on Neural Networks (2004) (in press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hussain, A., Durrani, T.S., Alkulaibi, A., Mtetwa, N. (2005). Nonlinear Adaptive Speech Enhancement Inspired by Early Auditory Processing. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_13
Download citation
DOI: https://doi.org/10.1007/11520153_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)