Abstract
In this paper we analyze the combination of speech and FIR filter design aspect to achieve good results in speech quality. A new approach in the time domain based on the least Pth norm is presented to extract maximum information that represents speech. The aim of this paper is to improve the perceived quality of speech through the introduction of least Pth norm algorithm that attenuates speech contaminated with noise. This approach relates to a filter bank structure and a method for filtering and separating an information signal into different bands, particularly for filtering and separation of speech signals. Then the desired signal is reconstructed from the independent components representing every band. This approach differs from the traditional approaches since no priori knowledge of the noise statistics is required, instead the noise signals are only assumed to have finite energy. Since the estimation criterion for the filter design is to minimize the worst possible amplification of the estimation error signal in terms of modeling errors and additive noise, this approach is highly robust and appropriate in practical speech analysis and synthesis. This paper presents a least Pth approach to the optimal design of FIR digital filter banks in the minimax sense for speech analysis and synthesis. The signal to noise ratio (SNR) of around 50–60 dB is achieved with various speech samples.
Similar content being viewed by others
References
Allen, J. (1982). Application of the short-time Fourier transform to speech processing and spectral analysis. In Proc. int. conf. on acoust. speech and sig. proc. (pp. 1012–1015).
Deller, J., Proakis, J., & Hansen, J. (1993). Discrete-time processing of speech signals. New York: Macmillan.
Ephraim, Y. (1990). A minimum mean square error approach for speech enhancement. In Proc. IEEE ICASSP (pp. 829–832).
Flanagan, J. L. (1965). Speech analysis synthesis and perception. New York: Academic Press. p. 119.
Gibson, J. D., Koo, B., & Gray, S. D. (1991). Filtering of colored noise for speech enhancement and coding. IEEE Transactions on Signal Processing, 39, 1732–1742.
Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(2), 236–243.
Karam, J. (2007). Various speech processing techniques for speech compression and recognition. In Proceedings of World Academy of Science, Engineering and Technology, 26. ISSN 1307-6884.
Lim, J. S., & Oppenheim, A. V. (1978). All-pole modeling of degraded speech. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 197–210.
Nayebi, K., Barnwell, T. P., & Smith, M. J. T. (1992). Time domain filter bank analysis. A new design theory, 40(6).
Paliwal, K. K., & Alsteris, L. (2003). Usefulness of phase spectrum in human speech perception. In Euro speech 2003, Geneva.
Paliwal, K. K., & Basu, A. (1987). A speech enhancement method based on Kalman filtering. In Proc. IEEE ICASSP (pp. 177–180).
Pitsikalis, V., & Maragos, P. (2002). Speech analysis and feature extraction using chaotic models. In Proc. int’l conf. acoustics speech and signal processing (ICASSP-2002), Orlando, USA, May 2002 (pp. 533–536).
Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Prentice-Hall: Englewood Cliffs.
W’Ojcicki, K. K., & Paliwal, K. K. (2007). Importance of the dynamic range of an analysis window function for phase-only and magnitude-only reconstruction of speech. In ICASSP 2007.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gokhale, M.Y., Khanduja, D.K. Analysis and synthesis of speech using least Pth norm filter design. Int J Speech Technol 11, 51–61 (2008). https://doi.org/10.1007/s10772-009-9035-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-009-9035-7