Skip to main content
Log in

Spectral entropy and spectral shape based pre-quantization for real time speaker identification system

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Pre-processing is one of the vital steps for developing robust and efficient recognition system. Better pre-processing not only aid in better data selection but also in significant reduction of computational complexity. Further an efficient frame selection technique can improve the overall performance of the system. Pre-quantization (PQ) is the technique of selecting less number of frames in the pre-processing stage to reduce the computational burden in the post processing stages of speaker identification (SI). In this paper, we develop PQ techniques based on spectral entropy and spectral shape to pick suitable frames containing speaker specific information that varies from frame to frame depending on spoken text and environmental conditions. The attempt is to exploit the statistical properties of distributions of speech frames at the pre-processing stage of speaker recognition. Our aim is not only to reduce the frame rate but also to maintain identification accuracy reasonably high. Further we have also analyzed the robustness of our proposed techniques on noisy utterances. To establish the efficacy of our proposed methods, we used two different databases, POLYCOST (telephone speech) and YOHO (microphone speech).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Campbell, J. P. (1995). Testing with the YOHO CDROM voice verification corpus. In Proceedings international conference on acoustic, speech, and signal processing (pp. 341–344).

    Google Scholar 

  • Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462.

    Article  Google Scholar 

  • Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed., pp. 13–14). New York: Wiley.

    MATH  Google Scholar 

  • Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representation for mono-syllabic word recognition in continuously spoken sentences. IEEE Transactions on Audio, Speech, and Signal Processing, ASSP-28(4), 357–365.

    Article  Google Scholar 

  • Dempster, A., Laird, N., & Rubii, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.

    MATH  MathSciNet  Google Scholar 

  • Heideman, M. T. (1992). Computation of an odd-length DCT from a real-valued DFT of the same length. IEEE Transactions on Signal Processing, 40(1), 54–61.

    Article  MATH  MathSciNet  Google Scholar 

  • Hennebert, J., Melin, H., Petrovska, D., & Genoud, D. (2000). POLYCOST: a telephone-speech database for speaker recognition. Speech Communication, 31(2–3), 265–270.

    Article  Google Scholar 

  • Jung, C., Kim, M., & Kong, H. (2009). Selecting feature frames for automatic speaker recognition using mutual information. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1332–1340.

    Article  Google Scholar 

  • Kinnunen, T., Karpov, E., & Franti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 277–288.

    Article  Google Scholar 

  • Misra, H., Ikbal, S., Bourlard, H., & Hermansky, H. (2004). Spectral entropy based feature for robust ASR. In IEEE international conference on acoustics, speech, and signal processing, 2004 (ICASSP ’04) (Vol. 1, pp. I-193-6).

    Chapter  Google Scholar 

  • Papoulis, A. (2008). Probability, random variables and stochastic processes (4th ed., pp. 146–147). New Delhi: Tata McGraw-Hill Edition.

    Google Scholar 

  • Proakis, J. G., & Manolakis, D. G. (2004). Digital signal processing: principles, algorithms, and applications (3rd ed., pp. 456–459). Upper Saddle River: Pearson Education.

    Google Scholar 

  • Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83.

    Article  Google Scholar 

  • Sarkar, G., & Saha, G. (2009). Analysis of distance measures for pre-quantization before feature extraction in automatic speaker recognition. In India conference (INDICON), 2009 annual IEEE, India (pp. 1–4). 18–20 December 2009.

    Google Scholar 

  • Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605–610.

    Article  MATH  MathSciNet  Google Scholar 

  • Soong, F., Rosenberg, E., Juang, B., & Rabiner, L. (1987). A vector quantization approach to speaker recognition. AT&T Technical Journal, 66, 14–26.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gourav Sarkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, G., Saha, G. Spectral entropy and spectral shape based pre-quantization for real time speaker identification system. Int J Speech Technol 13, 189–199 (2010). https://doi.org/10.1007/s10772-010-9079-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-010-9079-8

Keywords

Navigation