Abstract
Pre-processing is one of the vital steps for developing robust and efficient recognition system. Better pre-processing not only aid in better data selection but also in significant reduction of computational complexity. Further an efficient frame selection technique can improve the overall performance of the system. Pre-quantization (PQ) is the technique of selecting less number of frames in the pre-processing stage to reduce the computational burden in the post processing stages of speaker identification (SI). In this paper, we develop PQ techniques based on spectral entropy and spectral shape to pick suitable frames containing speaker specific information that varies from frame to frame depending on spoken text and environmental conditions. The attempt is to exploit the statistical properties of distributions of speech frames at the pre-processing stage of speaker recognition. Our aim is not only to reduce the frame rate but also to maintain identification accuracy reasonably high. Further we have also analyzed the robustness of our proposed techniques on noisy utterances. To establish the efficacy of our proposed methods, we used two different databases, POLYCOST (telephone speech) and YOHO (microphone speech).
Similar content being viewed by others
References
Campbell, J. P. (1995). Testing with the YOHO CDROM voice verification corpus. In Proceedings international conference on acoustic, speech, and signal processing (pp. 341–344).
Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462.
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed., pp. 13–14). New York: Wiley.
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representation for mono-syllabic word recognition in continuously spoken sentences. IEEE Transactions on Audio, Speech, and Signal Processing, ASSP-28(4), 357–365.
Dempster, A., Laird, N., & Rubii, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
Heideman, M. T. (1992). Computation of an odd-length DCT from a real-valued DFT of the same length. IEEE Transactions on Signal Processing, 40(1), 54–61.
Hennebert, J., Melin, H., Petrovska, D., & Genoud, D. (2000). POLYCOST: a telephone-speech database for speaker recognition. Speech Communication, 31(2–3), 265–270.
Jung, C., Kim, M., & Kong, H. (2009). Selecting feature frames for automatic speaker recognition using mutual information. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1332–1340.
Kinnunen, T., Karpov, E., & Franti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 277–288.
Misra, H., Ikbal, S., Bourlard, H., & Hermansky, H. (2004). Spectral entropy based feature for robust ASR. In IEEE international conference on acoustics, speech, and signal processing, 2004 (ICASSP ’04) (Vol. 1, pp. I-193-6).
Papoulis, A. (2008). Probability, random variables and stochastic processes (4th ed., pp. 146–147). New Delhi: Tata McGraw-Hill Edition.
Proakis, J. G., & Manolakis, D. G. (2004). Digital signal processing: principles, algorithms, and applications (3rd ed., pp. 456–459). Upper Saddle River: Pearson Education.
Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83.
Sarkar, G., & Saha, G. (2009). Analysis of distance measures for pre-quantization before feature extraction in automatic speaker recognition. In India conference (INDICON), 2009 annual IEEE, India (pp. 1–4). 18–20 December 2009.
Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605–610.
Soong, F., Rosenberg, E., Juang, B., & Rabiner, L. (1987). A vector quantization approach to speaker recognition. AT&T Technical Journal, 66, 14–26.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sarkar, G., Saha, G. Spectral entropy and spectral shape based pre-quantization for real time speaker identification system. Int J Speech Technol 13, 189–199 (2010). https://doi.org/10.1007/s10772-010-9079-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-010-9079-8