Speaker Identification by Combining Various Vocal Tract and Vocal Source Features

Kawakami, Yuta; Wang, Longbiao; Kai, Atsuhiko; Nakagawa, Seiichi

doi:10.1007/978-3-319-10816-2_46

Yuta Kawakami²¹,
Longbiao Wang²¹,
Atsuhiko Kai²² &
…
Seiichi Nakagawa²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1537 Accesses
3 Citations

Abstract

Previously, we proposed a speaker recognition system using a combination of MFCC-based vocal tract feature and phase information which includes rich vocal source information. In this paper, we investigate the efficiency of combination of various vocal tract features (MFCC and LPCC) and vocal source features (phase and LPC residual) for normal-duration and short-duration utterance. The Japanese Newspaper Article Sentence (JNAS) database was used to evaluate our proposed method. The combination of various vocal tract and vocal source features achieved remarkable improvement than the conventional MFCC-based vocal tract feature for both normal-duration and short-duration utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52(1), 12–40 (2010)
Article Google Scholar
Davis, S., Santa, B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Makhoul, J., Bolt, B.: Linear prediction: A tutorial review. Proc. of IEEE 63(4), 561–580 (1975)
Article Google Scholar
Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine 13, 58–71 (1996)
Article Google Scholar
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, New Jersey (2001)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87(4), 1738–1752
Google Scholar
Wang, L., Kitaoka, N., Nakagawa, S.: Robust Distant Speaker Recognition Based on Position Dependent Cepstral Mean Normalization. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005-Eurospeech), pp. 1977–1980 (2005)
Google Scholar
Wang, L., Kitaoka, N., Nakagawa, S.: Robust distant speaker recognition based on position dependent CMN by combining speaker-specific GMM with speaker-adapted HMM. Speech Communication 49, 501–513 (2007)
Article Google Scholar
Markov, K.P., Nakagawa, S.: Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition. Jour. ASJ (E) 20(4), 281–291 (1999)
Google Scholar
Zheng, N., Lee, T., Ching, P.C.: Integration of complementary acoustic features for speaker recognition. IEEE Signal Processing Letters 14(3), 181–184 (2007)
Article Google Scholar
Hedge, R.M., Murthy, H.A., Rao, G.V.R.: Application of the modified group delay function to speaker identification and discrimination. In: Proc. ICASSP 2004, vol. 1, pp. 517–520 (2004)
Google Scholar
Padmanabhan, R., Parthasarathi, S., Murthy, H.: Robustness of phase based features for speaker recognition. In: Proc. Interspeech, pp. 2355–2358 (2009)
Google Scholar
Kua, J., Epps, J., Ambikairajah, E., Choi, E.: LS regularization of group delay features for speaker recognition. In: Proc. Interspeech, pp. 2887–2890 (2009)
Google Scholar
Nakagawa, S., Asakawa, K., Wang, L.: Speaker recognition by combining MFCC and phase information. In: Proc. InterSpeech, pp. 2005–2008 (2007)
Google Scholar
Wang, L., Ohtsuka, S., Nakagawa, S.: High improvement of speaker identification and verification by combining MFCC and phase information. In: Proc. ICASSP, pp. 4529–4532 (2009)
Google Scholar
Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: Proc. ICASSP, pp. 4502–4505 (2010)
Google Scholar
Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Transactions on Information and Systems E93-Dd(9), 2397–2406 (2010)
Article Google Scholar
Hirano, Y., Wang, L., Kai, A., Nakagawa, S.: On the Use of Phase Information-based Joint Factor Analysis for Speaker Verification under Channel Mismatch Condition. In: Proc. of APSIPA ASC 2012, 4 pages (2012)
Google Scholar
Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker Identification and Verification by Combining MFCC and Phase Information. IEEE Trans. on Audio, Speech, and Language Processing 20(4), 1085–1095 (2012)
Article Google Scholar
Shimada, K., Yamamoto, K., Nakagawa, S.: Speaker identification using pseudo pitch/synchronized phase information in voiced sound. In: Proc. APSIPA ASC 2011, pp. 1–6 (2011)
Google Scholar
Kawakami, Y., Wang, L., Nakagawa, S.: Speaker Identification Using Pseudo Pitch Synchronized Phase Information in Noisy Environments. In: Proc. APSIPA ASC 2012, 5 pages (2013)
Google Scholar
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS:Japanese speech coupus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20(13), 199–206 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Nagaoka University of Technology, Japan
Yuta Kawakami & Longbiao Wang
Shizuoka University, Japan
Atsuhiko Kai
Toyohashi University of Technology, Japan
Seiichi Nakagawa

Authors

Yuta Kawakami
View author publications
You can also search for this author in PubMed Google Scholar
Longbiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Atsuhiko Kai
View author publications
You can also search for this author in PubMed Google Scholar
Seiichi Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Botanicá 6a, 60200, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kawakami, Y., Wang, L., Kai, A., Nakagawa, S. (2014). Speaker Identification by Combining Various Vocal Tract and Vocal Source Features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-10816-2_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics