Abstract
Extraction of robust features from noisy speech signals is one of the challenging problems in speaker recognition. As bispectrum and all higher order spectra for Gaussian process are identically zero, it removes the additive white Gaussian noise while preserving the magnitude and phase information of original signal. The spectrum of original signal can be recovered from its noisy version using this property. Robust Mel Frequency Cepstral Coefficients (MFCC) are extracted from the estimated spectral magnitude (denoted as Bispectral-MFCC (BMFCC)). The effectiveness of BMFCC has been tested on TIMIT and SGGS databases in noisy environment. The proposed BMFCC features yield 95.30 %, 97.26 % and 94.22 % speaker recognition rate on TIMIT, SGGS and SGGS2 databases, respectively for 20 dB SNR whereas these values for 0 dB SNR are 45.84 %, 50.79 % and 44.98 %. The experimental results show the superiority of the proposed technique compared to conventional methods for all databases.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustic Speech and Signal Processing, 27, 113–120.
Chandran, V., & Elgar, S. L. (1993). Pattern recognition using invariants defined from higher order spectra-one-dimensional inputs. IEEE Transactions on Signal Processing, 41(1), 205–212.
Chen, J., Paliwal, K. K., & Nakamura, S. (2003). Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Communication, 41, 469–484.
Davis, S. B., & Mermelstine, P. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transaction on Acoustic Speech and Signal Processing, 28, 357–366.
Fulchiero, R., & Spanias, A. S. (1993). Speech enhancement using the bispectrum. In IEEE ICASSP proceedings, Minnesota (pp. 488–491).
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transaction on Acoustic Speech and Signal Processing, 29, 256–272.
Gales, M. J. F., & Young, S. J. (1996). Robust speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4, 352–359.
Hariharan, R., Kiss, I., & Viikki, O. (2001). Noise robust speech parameterization using multiresolutaion feature extraction. IEEE Transactions on Speech and Audio Processing, 9(8), 856–865.
Harmansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. Journal of Acoustic Society of America, 87(4), 1738–1752.
Harmansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2, 578–589.
Holambe, R. S., Ray, A. K., & Basu, T. K. (1996). Phase-only blind deconvolution using bicepstrum iterative reconstruction algorithm (BIRA). IEEE Transactions on Signal Processing, 44(9), 2356–2359.
Huber, P. J., Kleiner, B., Gasser, T., & Dumermuth, G. (1971). Statistical methods for investigating phase relations in stationary stochastic processes. IEEE Transactions on Audio and Electroacoustics, 19(1), 78–86.
Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In IEEE ICASSP proceedings, Albuquerque, New Mexico (pp. 381–384).
Kotnik, B., & Kačič, Z. (2007). A comprehensive noise robust speech parameterization algorithm using wavelet packet decomposition-based denoising and speech feature representation techniques. EURASIP Journal on Advances in Signal Processing, 1, 1–20.
Lookwood, P., & Boudy, J. (1992). Experiments with nonlinear speech subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication, 11, 215–228.
Navarro-Mesa, J., Moreno-Bilbao, A., & Lleida-Solano, E. (1999). An improved speech endpoint detection system in noisy environments by means of third-order spectra. IEEE Signal Processing Letters, 6(9), 224–226.
Nikias, C. L., & Raghuveer, M. R. (1987). Bispectrum estimation: A digital signal processing framework. IEEE Proceedings, 75(7), 869–891.
Oppenheim, A. V., & Schafer, R. W. (1997). Cepstrum analysis and homomorphic deconvolution. In Discrete-time signal processing (4th ed., pp. 768–834). Englewood Cliffs: Prentics-Hall.
Raghuveer, M. R., & Nikias, C. L. (1985). Bispectrum estimation: A parametric approach. IEEE Transactions on Acoustics Speech and Signal Processing, 23(4), 1213–1230.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions Speech and Audio Processing, 3(1), 72–82.
Sasaki, K., Sato, T., & Yamashita, Y. (1977). Holographic passive sonar. IEEE Transactions on Sonics and Ultrasonics, 24(3), 193–200.
Seetharaman, S., & Jernigan, M. E. (1988). Speech signal reconstruction based on higher order spectra. In IEEE ICASSP proceedings, New York (pp. 703–706).
Sundaramoorthy, G., Raghuveer, M. R., & Dianat, S. A. (1990). Bispectral reconstruction of signals in noise: amplitude reconstruction issues. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(7), 1297–1306.
Viikki, O., Bye, D., & Laurila, K. (1998). A recursive feature vector normalization approach for robust speech recognition noise. In IEEE ICASSP proceedings, Seattle, WA (pp. 733–736).
Xu, J., & Wei, G. (2000). Noise-robust speech recognition based on difference of power spectrum. Electronics Letters, 36(14), 1247–1248.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ajmera, P.K., Nehe, N.S., Jadhav, D.V. et al. Robust feature extraction from spectrum estimated using bispectrum for speaker recognition. Int J Speech Technol 15, 433–440 (2012). https://doi.org/10.1007/s10772-012-9153-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-012-9153-5