Skip to main content

Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

  • Conference paper
  • First Online:
Book cover Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

Abstract

Whisper is an alternative way of speech communication especially when a speaker does not want to reveal the information other than the target listeners. Generally, speaker-specific information is present in both excitation source and vocal tract system. However, whispered speech does not contain significant source characteristics as there is almost no excitation by the vocal folds, and speaker information in vocal tract system is also low as compared to the normal speech signal. Hence, it is difficult to recognize a speaker from his/her whispered speech. To address this, features based on vocal tract system characteristics such as state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and recently developed Cochlear Frequency Cepstral Coefficients (CFCC) are proposed. CHAINS (Characterizing individual speakers) whispered speech database is used for conducting experiments using GMM-UBM (Gaussian Mixture Modeling- Universal Background Modeling) approach. It was observed from the experiments that the fusion of CFCC and MFCC gives improvement in % IR (Identification Rate) and % EER (Equal Error Rate) than MFCC alone, indicating that proposed features and their score-level fusion captures complementary speaker-specific information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, M., Shikano, K., Kuwabara, H.: Cross-language voice conversion. In: Int. Conf. on Acous., Speech, & Signal Process., (ICASSP-1990), pp. 345–348. IEEE, New Mexico (1990)

    Google Scholar 

  2. Yegnanarayana, B., Prasanna, S., Zachariah, J.M., Gupta, C.S.: Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. on Speech and Audio Process. 13(4), 575–582 (2005)

    Article  Google Scholar 

  3. Imai, S., Kobayashi, T., Tokuda, K., Masuko, T., Koishida, K., Sako, S., Zen, H.: Speech signal processing toolkit (SPTK), Version 3.3 (2009)

    Google Scholar 

  4. Yegnanarayana, B., Sharat Reddy, K., Kishore, S.P.: Source and system features for speaker recognition using AANN models. In: IEEE Int. Conf. on Acous., Speech, and Signal Process., (ICASSP 2001), vol. 1, pp. 409–412. IEEE, Salt Lake City (2001)

    Google Scholar 

  5. Fan, X., Hansen, J.H.: Speaker identification within whispered speech audio streams. IEEE Trans. on Audio, Speech, and Lang. Process. 19(5), 1408–1421 (2011)

    Article  Google Scholar 

  6. Gavidia-Ceballos, L.: Analysis and modeling of speech for laryngeal pathology assessment. PhD thesis, Duke University, Durham NC, USA (1995)

    Google Scholar 

  7. Gavidia-Ceballos, L., Hansen, J.H.: Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection. IEEE Trans. on Biomedical Engg. 43(4), 373–383 (1996)

    Article  Google Scholar 

  8. Meyer-Eppler, W.: Realization of prosodic features in whispered speech. The Journal of the Acoustical Society of America 29(1), 104–106 (1957)

    Article  Google Scholar 

  9. Thomas, I.: Perceived pitch of whispered vowels. The Journal of the Acoustical Society of America 46(2B), 468–470 (1969)

    Article  Google Scholar 

  10. Jovicic, S.T.: Formant feature differences between whispered and voiced sustained vowels. Acta Acustica United with Acustica 84(4), 739–743 (1998)

    Google Scholar 

  11. Morris, R.W., Clements, M.A.: Reconstruction of speech from whispers. Medical Engineering & Physics 24(7), 515–520 (2002)

    Article  Google Scholar 

  12. Zhang, C., Hansen, J.H.: An entropy based feature for whisper-island detection within audio streams. In: INTERSPEECH, Brisbane, Australia, pp. 2510–2513 (2008)

    Google Scholar 

  13. Neustein, A., Patil, H.A.: Forensic speaker recognition. Springer (2012)

    Google Scholar 

  14. Childers, D.G., Wu, K.: Gender recognition from speech. Part II: Fine analysis. The Journal of the Acoustical Society of America 90(4), 1841–1856 (1991)

    Article  Google Scholar 

  15. Li, Q.: An auditory-based transfrom for audio signal processing. In: IEEE Workshop on Applications of Signal Process. to Audio and Acous., WASPAA 2009, pp. 181–184. IEEE, New York (2009)

    Google Scholar 

  16. Li, Q., Huang, Y.: An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans. on Audio, Speech, and Lang. Process. 19(6), 1791–1801 (2011)

    Article  Google Scholar 

  17. Bricker, P., Pruzansky, S.: Speaker recognition. In: Contemporary issues in experimental phonetics, pp. 295–326 (1976)

    Google Scholar 

  18. Cummins, F., Grimaldi, M., Leonard, T., Simko, J.: The CHAINS corpus: characterizing individual speakers. In: Proc. SPECOM, St. Petersburg, Russia, vol. 6, pp. 431–435 (2006)

    Google Scholar 

  19. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1), 19–41 (2000)

    Article  Google Scholar 

  20. Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de María, F.: Recognizing Over IP: A robust front-end for speech recognition on the world wide web. IEEE Trans. on Multimedia 3(2), 209–218 (2001)

    Article  Google Scholar 

  21. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Euro Conf. Speech Process. Tech., Rhodes, Greece, pp. 1895–1898 (1997)

    Google Scholar 

  22. Fan, X., Hansen, J.H.: Speaker identification with whispered speech based on modified LFCC parameters and feature mapping. In: IEEE Int. Conf. on Acous., Speech and Signal Process., (ICASSP 2009), pp. 4553–4556. IEEE, Taipei (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Raikar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Raikar, A., Gandhi, A., Patil, H.A. (2015). Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics