Elsevier

Signal Processing

Volume 88, Issue 7, July 2008, Pages 1881-1888
Signal Processing

Fast communication
Extension of the local subspace method to enhancement of speech with colored noise

https://doi.org/10.1016/j.sigpro.2008.01.008Get rights and content

Abstract

Based on dynamic features of human speech, the local projection (LP) method has been adapted to the enhancement of speech corrupted by white noise. As an extension of the LP method, a strategy with two rounds of projection is introduced to enhance the speech contaminated with colored noise. Colored noise mainly resides in a low dimensional subspace, and is assumed to be stationary in this communication. At step one, a noise dominated subspace is first estimated with colored noise obtained from speech silence frame. Then for the reference phase point, the components, projected into the noise dominated subspace, are deleted and the enhanced speech is reconstructed with the remaining components. The residual error of the output of step one tends to distribute uniformly on each direction. So at step two, the LP method is further applied to the output of step one, treating the residual error as white noise. An adaption of this strategy to continuous speech is performed. The results show that this strategy is more effective than the LP method in enhancing speech corrupted by colored noise, and is comparable to two typical speech enhancement methods.

Introduction

In the past several decades, a variety of speech enhancement methods have been proposed, including noise suppression in the frequency domain (e.g., spectral subtraction [1], Wiener filtering [1]) and noise elimination in the signal subspace [2], [3]. The signal space, properly reconstructed from noisy speech, can be divided into two orthogonal subspaces: (1) the noise subspace which contains components from the noise process only, and (2) the signal subspace that contains the dominant speech signal, plus a certain amount of noise as well. The components in the noise subspace are deleted and the enhanced speech is estimated from the remaining components in the signal subspace.

The conventional linear acoustical model of speech overlooks the inherent nonlinearity of speech production. Nonlinear analysis of speech signal discloses the chaos-like dynamic features in most phonemes, especially the voiced ones, despite the continuous speech possibly being highly non-deterministic and non-stationary [4], [7]. These facts call for nonlinear or linear/nonlinear hybrid models to characterize the nonlinearity in speech. Various techniques based on nonlinear dynamics have been utilized in speech analysis and processing, for example, classification of isolated phonemes [5] and speech enhancement with the local projection (LP) method [6], [7]. In the work of one of the authors and his collaborators [7], a survey of the LP method is presented from the viewpoint of signal processing, and a generalization of the LP method, i.e., the local subspace method, is proposed. However, this method is not so effective for speech contaminated by colored noise, because it assumes the noise is uniformly distributed on each direction, while colored noise mainly resides in a certain subspace. Thus, it is desirable to extend the LP method to the case of speech corrupted by colored noise, not only providing an alternative method to speech enhancement, but also demonstrating a more general application of the technique based on the framework of nonlinear dynamics, and thus possibly promoting new developments of this framework. As a follow-up to the previous work [7], this communication only gives a brief introduction of the related knowledge, and more details are referred to [7] and references therein.

Recently, a strategy with two rounds of projection has been proposed to reduce colored noise for noisy chaotic data [8]. In this paper, we will adopt this strategy, and thus extend the LP method to the enhancement of speech with colored noise. This strategy assumes that the colored noise is stationary, and thus the covariance matrix of colored noise can be estimated with speech silence frames. At the first step, a noise dominated subspace can be obtained (spanned by the eigenvectors associated with the several largest eigenvalues of the covariance matrix of colored noise). Then for each reference phase point, the components, projected into the noise dominated subspace, are deleted and the enhanced data are reconstructed with the remaining components. Thus, most of the colored noise can be eliminated, and certain degree of speech distortion may also be introduced. The energy of residual error (the difference between clean speech and the output of the first step) tends to distribute “uniformly” on each direction. Thus the residual error can be treated as white noise and the LP method can be further applied to the output of the first step.

The rest of this communication is organized as follows. In Section 2, the principle of our method with two rounds of projection is presented. In Section 3, the implementation of the proposed method is outlined, and the numerical results to speech corrupted by different types of environmental noise are shown. Finally, discussions are given in Section 4.

Section snippets

The principle of the method

Let {sn}n=1L denote the observation of L samples from a dynamic system S, e.g., the speech production system. With d-dimensional time delay embedding of {sn}, phase vectors can be formed as {sn}, sn=[sn,sn+τ,,sn+(d-1)τ]T, where τ is the time delay, and superscript T denotes vector transpose. According to the embedding theorem [9], the reconstructed attractor {sn} is topologically equivalent to the evolution of the hidden dynamics of S when d is bigger than the twice of the correlation

Implementation and numerical results

In [7], the LP method is applied to speech with additive white noise. To investigate the effectiveness of the proposed method in more general application, NOIZEUS database [14],1 designed for comparison of speech enhancement, is utilized in this communication. Ten speech sentences are selected from NOIZEUS for our experiment. Three male and three female speakers are involved, and at most two sentences articulated by each speaker are

Discussion

This communication extends the LP method to enhance speech corrupted by colored noise with two rounds of projection in the local phase space, and positive results are obtained. Speech and colored noise reside, respectively, in particular low dimensional subspaces of the properly reconstructed signal space. If the subspace of speech and the subspace of colored noise are separable, then most of the noise components can be reduced by nulling out the subspace of noise, and the enhanced speech can

Acknowledgments

This research was funded by a Hong Kong University Grants Council Grant Competitive Earmarked Research Grant (CERG) number PolyU 5269/06E.

References (18)

  • J. Sun et al.

    Enhancement of Chinese speech based on nonlinear dynamics

    Signal Processing

    (2007)
  • J.S. Lim et al.

    Enhancement and bandwidth compression of noisy speech

    Proc. IEEE

    (1979)
  • Y. Ephraim et al.

    A signal subspace approach for speech enhancement

    IEEE Trans. Speech Audio Process.

    (1995)
  • Y. Hu et al.

    A generalized subspace approach for enhancing speech corrupted by colored noise

    IEEE Trans. Speech Audio Process.

    (2003)
  • A. Kumar et al.

    Nonlinear dynamical analysis of speech

    J. Acoust. Soc. Amer.

    (1996)
  • R.J. Povinelli et al.

    Statistical models of reconstructed phase spaces for signal classification

    IEEE Trans. Signal Process.

    (2006)
  • R. Hegger et al.

    Noise reduction for human speech signals by local projections in embedding spaces

    IEEE Trans. Circuits Syst. I Fundam. Theory Appl.

    (2001)
  • J. Sun et al.

    Reducing colored noise for chaotic time series in the local phase space

    Phys. Rev. E

    (2007)
  • F. Takens, Detecting Strange in Attractors Turbulence, in: Lecture Notes in Mathematics, vol. 898, Springer, New York,...
There are more references available in the full text version of this article.

Cited by (21)

  • Supervised single channel dual domains speech enhancement using sparse non-negative matrix factorization

    2020, Digital Signal Processing: A Review Journal
    Citation Excerpt :

    The SE is essential for some signal processing applications, including hearing aids, mobile communications, and preprocessing for speech recognition [1]. The standard SE algorithm can be grouped into three categories [1]: spectral subtraction methods [2–4], statistical-model-based methods [5–10], and subspace methods [11–13]. These methods apply to the circumstance where the noises are stationary.

  • Supervised monaural speech enhancement using two-level complementary joint sparse representations

    2018, Applied Acoustics
    Citation Excerpt :

    Enhancing speech degraded by non-stationary real-world interference has been a topic of research in the last few decades, not only because of its difficulty, but also for various applications, including hearing aids, automatic speech recognition, mobile communications, etc. [1]. Conventional single-channel speech enhancement approaches can be categorized into three branches: spectral subtraction (SS) approaches [2–4], statistical-model-based approaches [5–8] and subspace approaches [9–12], the performances of which are mostly dependent on the estimated noise in the absence of speech activity, so their performance for non-stationary noise may not be satisfactory. Recently, some sparse-model-based speech enhancement approaches have been proposed by more and more researchers.

  • Variance normalized perceptual subspace speech enhancement

    2017, AEU - International Journal of Electronics and Communications
View all citing articles on Scopus
View full text