Fast communicationExtension of the local subspace method to enhancement of speech with colored noise
Introduction
In the past several decades, a variety of speech enhancement methods have been proposed, including noise suppression in the frequency domain (e.g., spectral subtraction [1], Wiener filtering [1]) and noise elimination in the signal subspace [2], [3]. The signal space, properly reconstructed from noisy speech, can be divided into two orthogonal subspaces: (1) the noise subspace which contains components from the noise process only, and (2) the signal subspace that contains the dominant speech signal, plus a certain amount of noise as well. The components in the noise subspace are deleted and the enhanced speech is estimated from the remaining components in the signal subspace.
The conventional linear acoustical model of speech overlooks the inherent nonlinearity of speech production. Nonlinear analysis of speech signal discloses the chaos-like dynamic features in most phonemes, especially the voiced ones, despite the continuous speech possibly being highly non-deterministic and non-stationary [4], [7]. These facts call for nonlinear or linear/nonlinear hybrid models to characterize the nonlinearity in speech. Various techniques based on nonlinear dynamics have been utilized in speech analysis and processing, for example, classification of isolated phonemes [5] and speech enhancement with the local projection (LP) method [6], [7]. In the work of one of the authors and his collaborators [7], a survey of the LP method is presented from the viewpoint of signal processing, and a generalization of the LP method, i.e., the local subspace method, is proposed. However, this method is not so effective for speech contaminated by colored noise, because it assumes the noise is uniformly distributed on each direction, while colored noise mainly resides in a certain subspace. Thus, it is desirable to extend the LP method to the case of speech corrupted by colored noise, not only providing an alternative method to speech enhancement, but also demonstrating a more general application of the technique based on the framework of nonlinear dynamics, and thus possibly promoting new developments of this framework. As a follow-up to the previous work [7], this communication only gives a brief introduction of the related knowledge, and more details are referred to [7] and references therein.
Recently, a strategy with two rounds of projection has been proposed to reduce colored noise for noisy chaotic data [8]. In this paper, we will adopt this strategy, and thus extend the LP method to the enhancement of speech with colored noise. This strategy assumes that the colored noise is stationary, and thus the covariance matrix of colored noise can be estimated with speech silence frames. At the first step, a noise dominated subspace can be obtained (spanned by the eigenvectors associated with the several largest eigenvalues of the covariance matrix of colored noise). Then for each reference phase point, the components, projected into the noise dominated subspace, are deleted and the enhanced data are reconstructed with the remaining components. Thus, most of the colored noise can be eliminated, and certain degree of speech distortion may also be introduced. The energy of residual error (the difference between clean speech and the output of the first step) tends to distribute “uniformly” on each direction. Thus the residual error can be treated as white noise and the LP method can be further applied to the output of the first step.
The rest of this communication is organized as follows. In Section 2, the principle of our method with two rounds of projection is presented. In Section 3, the implementation of the proposed method is outlined, and the numerical results to speech corrupted by different types of environmental noise are shown. Finally, discussions are given in Section 4.
Section snippets
The principle of the method
Let denote the observation of L samples from a dynamic system , e.g., the speech production system. With d-dimensional time delay embedding of , phase vectors can be formed as , , where is the time delay, and superscript T denotes vector transpose. According to the embedding theorem [9], the reconstructed attractor is topologically equivalent to the evolution of the hidden dynamics of when d is bigger than the twice of the correlation
Implementation and numerical results
In [7], the LP method is applied to speech with additive white noise. To investigate the effectiveness of the proposed method in more general application, NOIZEUS database [14],1 designed for comparison of speech enhancement, is utilized in this communication. Ten speech sentences are selected from NOIZEUS for our experiment. Three male and three female speakers are involved, and at most two sentences articulated by each speaker are
Discussion
This communication extends the LP method to enhance speech corrupted by colored noise with two rounds of projection in the local phase space, and positive results are obtained. Speech and colored noise reside, respectively, in particular low dimensional subspaces of the properly reconstructed signal space. If the subspace of speech and the subspace of colored noise are separable, then most of the noise components can be reduced by nulling out the subspace of noise, and the enhanced speech can
Acknowledgments
This research was funded by a Hong Kong University Grants Council Grant Competitive Earmarked Research Grant (CERG) number PolyU 5269/06E.
References (18)
- et al.
Enhancement of Chinese speech based on nonlinear dynamics
Signal Processing
(2007) - et al.
Enhancement and bandwidth compression of noisy speech
Proc. IEEE
(1979) - et al.
A signal subspace approach for speech enhancement
IEEE Trans. Speech Audio Process.
(1995) - et al.
A generalized subspace approach for enhancing speech corrupted by colored noise
IEEE Trans. Speech Audio Process.
(2003) - et al.
Nonlinear dynamical analysis of speech
J. Acoust. Soc. Amer.
(1996) - et al.
Statistical models of reconstructed phase spaces for signal classification
IEEE Trans. Signal Process.
(2006) - et al.
Noise reduction for human speech signals by local projections in embedding spaces
IEEE Trans. Circuits Syst. I Fundam. Theory Appl.
(2001) - et al.
Reducing colored noise for chaotic time series in the local phase space
Phys. Rev. E
(2007) - F. Takens, Detecting Strange in Attractors Turbulence, in: Lecture Notes in Mathematics, vol. 898, Springer, New York,...
Cited by (21)
Advanced transient noise reduction in speech signals via semi-supervised signal fusion
2024, Applied AcousticsSynchronization of machine learning oscillators in complex networks
2023, Information SciencesSupervised single channel dual domains speech enhancement using sparse non-negative matrix factorization
2020, Digital Signal Processing: A Review JournalCitation Excerpt :The SE is essential for some signal processing applications, including hearing aids, mobile communications, and preprocessing for speech recognition [1]. The standard SE algorithm can be grouped into three categories [1]: spectral subtraction methods [2–4], statistical-model-based methods [5–10], and subspace methods [11–13]. These methods apply to the circumstance where the noises are stationary.
Supervised monaural speech enhancement using two-level complementary joint sparse representations
2018, Applied AcousticsCitation Excerpt :Enhancing speech degraded by non-stationary real-world interference has been a topic of research in the last few decades, not only because of its difficulty, but also for various applications, including hearing aids, automatic speech recognition, mobile communications, etc. [1]. Conventional single-channel speech enhancement approaches can be categorized into three branches: spectral subtraction (SS) approaches [2–4], statistical-model-based approaches [5–8] and subspace approaches [9–12], the performances of which are mostly dependent on the estimated noise in the absence of speech activity, so their performance for non-stationary noise may not be satisfactory. Recently, some sparse-model-based speech enhancement approaches have been proposed by more and more researchers.
Variance normalized perceptual subspace speech enhancement
2017, AEU - International Journal of Electronics and Communications