Skip to main content

Speaker Identification by Combining Various Vocal Tract and Vocal Source Features

  • Conference paper
Text, Speech and Dialogue (TSD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

Abstract

Previously, we proposed a speaker recognition system using a combination of MFCC-based vocal tract feature and phase information which includes rich vocal source information. In this paper, we investigate the efficiency of combination of various vocal tract features (MFCC and LPCC) and vocal source features (phase and LPC residual) for normal-duration and short-duration utterance. The Japanese Newspaper Article Sentence (JNAS) database was used to evaluate our proposed method. The combination of various vocal tract and vocal source features achieved remarkable improvement than the conventional MFCC-based vocal tract feature for both normal-duration and short-duration utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52(1), 12–40 (2010)

    Article  Google Scholar 

  2. Davis, S., Santa, B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)

    Article  Google Scholar 

  3. Makhoul, J., Bolt, B.: Linear prediction: A tutorial review. Proc. of IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  4. Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine 13, 58–71 (1996)

    Article  Google Scholar 

  5. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, New Jersey (2001)

    Google Scholar 

  6. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87(4), 1738–1752

    Google Scholar 

  7. Wang, L., Kitaoka, N., Nakagawa, S.: Robust Distant Speaker Recognition Based on Position Dependent Cepstral Mean Normalization. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005-Eurospeech), pp. 1977–1980 (2005)

    Google Scholar 

  8. Wang, L., Kitaoka, N., Nakagawa, S.: Robust distant speaker recognition based on position dependent CMN by combining speaker-specific GMM with speaker-adapted HMM. Speech Communication 49, 501–513 (2007)

    Article  Google Scholar 

  9. Markov, K.P., Nakagawa, S.: Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition. Jour. ASJ (E) 20(4), 281–291 (1999)

    Google Scholar 

  10. Zheng, N., Lee, T., Ching, P.C.: Integration of complementary acoustic features for speaker recognition. IEEE Signal Processing Letters 14(3), 181–184 (2007)

    Article  Google Scholar 

  11. Hedge, R.M., Murthy, H.A., Rao, G.V.R.: Application of the modified group delay function to speaker identification and discrimination. In: Proc. ICASSP 2004, vol. 1, pp. 517–520 (2004)

    Google Scholar 

  12. Padmanabhan, R., Parthasarathi, S., Murthy, H.: Robustness of phase based features for speaker recognition. In: Proc. Interspeech, pp. 2355–2358 (2009)

    Google Scholar 

  13. Kua, J., Epps, J., Ambikairajah, E., Choi, E.: LS regularization of group delay features for speaker recognition. In: Proc. Interspeech, pp. 2887–2890 (2009)

    Google Scholar 

  14. Nakagawa, S., Asakawa, K., Wang, L.: Speaker recognition by combining MFCC and phase information. In: Proc. InterSpeech, pp. 2005–2008 (2007)

    Google Scholar 

  15. Wang, L., Ohtsuka, S., Nakagawa, S.: High improvement of speaker identification and verification by combining MFCC and phase information. In: Proc. ICASSP, pp. 4529–4532 (2009)

    Google Scholar 

  16. Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: Proc. ICASSP, pp. 4502–4505 (2010)

    Google Scholar 

  17. Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Transactions on Information and Systems E93-Dd(9), 2397–2406 (2010)

    Article  Google Scholar 

  18. Hirano, Y., Wang, L., Kai, A., Nakagawa, S.: On the Use of Phase Information-based Joint Factor Analysis for Speaker Verification under Channel Mismatch Condition. In: Proc. of APSIPA ASC 2012, 4 pages (2012)

    Google Scholar 

  19. Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker Identification and Verification by Combining MFCC and Phase Information. IEEE Trans. on Audio, Speech, and Language Processing 20(4), 1085–1095 (2012)

    Article  Google Scholar 

  20. Shimada, K., Yamamoto, K., Nakagawa, S.: Speaker identification using pseudo pitch/synchronized phase information in voiced sound. In: Proc. APSIPA ASC 2011, pp. 1–6 (2011)

    Google Scholar 

  21. Kawakami, Y., Wang, L., Nakagawa, S.: Speaker Identification Using Pseudo Pitch Synchronized Phase Information in Noisy Environments. In: Proc. APSIPA ASC 2012, 5 pages (2013)

    Google Scholar 

  22. Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS:Japanese speech coupus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20(13), 199–206 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kawakami, Y., Wang, L., Kai, A., Nakagawa, S. (2014). Speaker Identification by Combining Various Vocal Tract and Vocal Source Features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_46

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics