Spectral entropy and spectral shape based pre-quantization for real time speaker identification system

Sarkar, Gourav; Saha, Goutam

doi:10.1007/s10772-010-9079-8

Spectral entropy and spectral shape based pre-quantization for real time speaker identification system

Published: 09 October 2010

Volume 13, pages 189–199, (2010)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Gourav Sarkar¹ &
Goutam Saha¹

117 Accesses
2 Citations
Explore all metrics

Abstract

Pre-processing is one of the vital steps for developing robust and efficient recognition system. Better pre-processing not only aid in better data selection but also in significant reduction of computational complexity. Further an efficient frame selection technique can improve the overall performance of the system. Pre-quantization (PQ) is the technique of selecting less number of frames in the pre-processing stage to reduce the computational burden in the post processing stages of speaker identification (SI). In this paper, we develop PQ techniques based on spectral entropy and spectral shape to pick suitable frames containing speaker specific information that varies from frame to frame depending on spoken text and environmental conditions. The attempt is to exploit the statistical properties of distributions of speech frames at the pre-processing stage of speaker recognition. Our aim is not only to reduce the frame rate but also to maintain identification accuracy reasonably high. Further we have also analyzed the robustness of our proposed techniques on noisy utterances. To establish the efficacy of our proposed methods, we used two different databases, POLYCOST (telephone speech) and YOHO (microphone speech).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

A Strategic Approach for Robust Dysarthric Speech Recognition

Article 01 February 2024

Chinese dialect speech recognition: a comprehensive survey

Article Open access 31 January 2024

References

Campbell, J. P. (1995). Testing with the YOHO CDROM voice verification corpus. In Proceedings international conference on acoustic, speech, and signal processing (pp. 341–344).
Google Scholar
Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462.
Article Google Scholar
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed., pp. 13–14). New York: Wiley.
MATH Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representation for mono-syllabic word recognition in continuously spoken sentences. IEEE Transactions on Audio, Speech, and Signal Processing, ASSP-28(4), 357–365.
Article Google Scholar
Dempster, A., Laird, N., & Rubii, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1), 1–38.
MATH MathSciNet Google Scholar
Heideman, M. T. (1992). Computation of an odd-length DCT from a real-valued DFT of the same length. IEEE Transactions on Signal Processing, 40(1), 54–61.
Article MATH MathSciNet Google Scholar
Hennebert, J., Melin, H., Petrovska, D., & Genoud, D. (2000). POLYCOST: a telephone-speech database for speaker recognition. Speech Communication, 31(2–3), 265–270.
Article Google Scholar
Jung, C., Kim, M., & Kong, H. (2009). Selecting feature frames for automatic speaker recognition using mutual information. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1332–1340.
Article Google Scholar
Kinnunen, T., Karpov, E., & Franti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 277–288.
Article Google Scholar
Misra, H., Ikbal, S., Bourlard, H., & Hermansky, H. (2004). Spectral entropy based feature for robust ASR. In IEEE international conference on acoustics, speech, and signal processing, 2004 (ICASSP ’04) (Vol. 1, pp. I-193-6).
Chapter Google Scholar
Papoulis, A. (2008). Probability, random variables and stochastic processes (4th ed., pp. 146–147). New Delhi: Tata McGraw-Hill Edition.
Google Scholar
Proakis, J. G., & Manolakis, D. G. (2004). Digital signal processing: principles, algorithms, and applications (3rd ed., pp. 456–459). Upper Saddle River: Pearson Education.
Google Scholar
Reynolds, D. A., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83.
Article Google Scholar
Sarkar, G., & Saha, G. (2009). Analysis of distance measures for pre-quantization before feature extraction in automatic speaker recognition. In India conference (INDICON), 2009 annual IEEE, India (pp. 1–4). 18–20 December 2009.
Google Scholar
Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66(3), 605–610.
Article MATH MathSciNet Google Scholar
Soong, F., Rosenberg, E., Juang, B., & Rabiner, L. (1987). A vector quantization approach to speaker recognition. AT&T Technical Journal, 66, 14–26.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Communication Engineering, IIT Kharagpur, Pin, 721302, India
Gourav Sarkar & Goutam Saha

Authors

Gourav Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Saha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gourav Sarkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, G., Saha, G. Spectral entropy and spectral shape based pre-quantization for real time speaker identification system. Int J Speech Technol 13, 189–199 (2010). https://doi.org/10.1007/s10772-010-9079-8

Download citation

Received: 21 May 2010
Accepted: 21 September 2010
Published: 09 October 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10772-010-9079-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectral entropy and spectral shape based pre-quantization for real time speaker identification system

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A Strategic Approach for Robust Dysarthric Speech Recognition

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectral entropy and spectral shape based pre-quantization for real time speaker identification system

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A Strategic Approach for Robust Dysarthric Speech Recognition

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation