Skip to main content
Log in

Robust feature extraction from spectrum estimated using bispectrum for speaker recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Extraction of robust features from noisy speech signals is one of the challenging problems in speaker recognition. As bispectrum and all higher order spectra for Gaussian process are identically zero, it removes the additive white Gaussian noise while preserving the magnitude and phase information of original signal. The spectrum of original signal can be recovered from its noisy version using this property. Robust Mel Frequency Cepstral Coefficients (MFCC) are extracted from the estimated spectral magnitude (denoted as Bispectral-MFCC (BMFCC)). The effectiveness of BMFCC has been tested on TIMIT and SGGS databases in noisy environment. The proposed BMFCC features yield 95.30 %, 97.26 % and 94.22 % speaker recognition rate on TIMIT, SGGS and SGGS2 databases, respectively for 20 dB SNR whereas these values for 0 dB SNR are 45.84 %, 50.79 % and 44.98 %. The experimental results show the superiority of the proposed technique compared to conventional methods for all databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustic Speech and Signal Processing, 27, 113–120.

    Article  Google Scholar 

  • Chandran, V., & Elgar, S. L. (1993). Pattern recognition using invariants defined from higher order spectra-one-dimensional inputs. IEEE Transactions on Signal Processing, 41(1), 205–212.

    Article  MATH  Google Scholar 

  • Chen, J., Paliwal, K. K., & Nakamura, S. (2003). Cepstrum derived from differentiated power spectrum for robust speech recognition. Speech Communication, 41, 469–484.

    Article  Google Scholar 

  • Davis, S. B., & Mermelstine, P. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transaction on Acoustic Speech and Signal Processing, 28, 357–366.

    Article  Google Scholar 

  • Fulchiero, R., & Spanias, A. S. (1993). Speech enhancement using the bispectrum. In IEEE ICASSP proceedings, Minnesota (pp. 488–491).

    Google Scholar 

  • Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transaction on Acoustic Speech and Signal Processing, 29, 256–272.

    Google Scholar 

  • Gales, M. J. F., & Young, S. J. (1996). Robust speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing, 4, 352–359.

    Article  Google Scholar 

  • Hariharan, R., Kiss, I., & Viikki, O. (2001). Noise robust speech parameterization using multiresolutaion feature extraction. IEEE Transactions on Speech and Audio Processing, 9(8), 856–865.

    Article  Google Scholar 

  • Harmansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. Journal of Acoustic Society of America, 87(4), 1738–1752.

    Article  Google Scholar 

  • Harmansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2, 578–589.

    Article  Google Scholar 

  • Holambe, R. S., Ray, A. K., & Basu, T. K. (1996). Phase-only blind deconvolution using bicepstrum iterative reconstruction algorithm (BIRA). IEEE Transactions on Signal Processing, 44(9), 2356–2359.

    Article  Google Scholar 

  • Huber, P. J., Kleiner, B., Gasser, T., & Dumermuth, G. (1971). Statistical methods for investigating phase relations in stationary stochastic processes. IEEE Transactions on Audio and Electroacoustics, 19(1), 78–86.

    Article  Google Scholar 

  • Kaiser, J. F. (1990). On a simple algorithm to calculate the ‘energy’ of a signal. In IEEE ICASSP proceedings, Albuquerque, New Mexico (pp. 381–384).

    Google Scholar 

  • Kotnik, B., & Kačič, Z. (2007). A comprehensive noise robust speech parameterization algorithm using wavelet packet decomposition-based denoising and speech feature representation techniques. EURASIP Journal on Advances in Signal Processing, 1, 1–20.

    Google Scholar 

  • Lookwood, P., & Boudy, J. (1992). Experiments with nonlinear speech subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication, 11, 215–228.

    Article  Google Scholar 

  • Navarro-Mesa, J., Moreno-Bilbao, A., & Lleida-Solano, E. (1999). An improved speech endpoint detection system in noisy environments by means of third-order spectra. IEEE Signal Processing Letters, 6(9), 224–226.

    Article  Google Scholar 

  • Nikias, C. L., & Raghuveer, M. R. (1987). Bispectrum estimation: A digital signal processing framework. IEEE Proceedings, 75(7), 869–891.

    Article  Google Scholar 

  • Oppenheim, A. V., & Schafer, R. W. (1997). Cepstrum analysis and homomorphic deconvolution. In Discrete-time signal processing (4th ed., pp. 768–834). Englewood Cliffs: Prentics-Hall.

    Google Scholar 

  • Raghuveer, M. R., & Nikias, C. L. (1985). Bispectrum estimation: A parametric approach. IEEE Transactions on Acoustics Speech and Signal Processing, 23(4), 1213–1230.

    Article  Google Scholar 

  • Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions Speech and Audio Processing, 3(1), 72–82.

    Article  Google Scholar 

  • Sasaki, K., Sato, T., & Yamashita, Y. (1977). Holographic passive sonar. IEEE Transactions on Sonics and Ultrasonics, 24(3), 193–200.

    Article  Google Scholar 

  • Seetharaman, S., & Jernigan, M. E. (1988). Speech signal reconstruction based on higher order spectra. In IEEE ICASSP proceedings, New York (pp. 703–706).

    Google Scholar 

  • Sundaramoorthy, G., Raghuveer, M. R., & Dianat, S. A. (1990). Bispectral reconstruction of signals in noise: amplitude reconstruction issues. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(7), 1297–1306.

    Article  Google Scholar 

  • Viikki, O., Bye, D., & Laurila, K. (1998). A recursive feature vector normalization approach for robust speech recognition noise. In IEEE ICASSP proceedings, Seattle, WA (pp. 733–736).

    Google Scholar 

  • Xu, J., & Wei, G. (2000). Noise-robust speech recognition based on difference of power spectrum. Electronics Letters, 36(14), 1247–1248.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawan K. Ajmera.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ajmera, P.K., Nehe, N.S., Jadhav, D.V. et al. Robust feature extraction from spectrum estimated using bispectrum for speaker recognition. Int J Speech Technol 15, 433–440 (2012). https://doi.org/10.1007/s10772-012-9153-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9153-5

Keywords

Navigation