Abstract
Previously, we proposed a speaker recognition system using a combination of MFCC-based vocal tract feature and phase information which includes rich vocal source information. In this paper, we investigate the efficiency of combination of various vocal tract features (MFCC and LPCC) and vocal source features (phase and LPC residual) for normal-duration and short-duration utterance. The Japanese Newspaper Article Sentence (JNAS) database was used to evaluate our proposed method. The combination of various vocal tract and vocal source features achieved remarkable improvement than the conventional MFCC-based vocal tract feature for both normal-duration and short-duration utterances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: From features to supervectors. Speech Communication 52(1), 12–40 (2010)
Davis, S., Santa, B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)
Makhoul, J., Bolt, B.: Linear prediction: A tutorial review. Proc. of IEEE 63(4), 561–580 (1975)
Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust speaker recognition: A feature-based approach. IEEE Signal Processing Magazine 13, 58–71 (1996)
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice-Hall, New Jersey (2001)
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87(4), 1738–1752
Wang, L., Kitaoka, N., Nakagawa, S.: Robust Distant Speaker Recognition Based on Position Dependent Cepstral Mean Normalization. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005-Eurospeech), pp. 1977–1980 (2005)
Wang, L., Kitaoka, N., Nakagawa, S.: Robust distant speaker recognition based on position dependent CMN by combining speaker-specific GMM with speaker-adapted HMM. Speech Communication 49, 501–513 (2007)
Markov, K.P., Nakagawa, S.: Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition. Jour. ASJ (E) 20(4), 281–291 (1999)
Zheng, N., Lee, T., Ching, P.C.: Integration of complementary acoustic features for speaker recognition. IEEE Signal Processing Letters 14(3), 181–184 (2007)
Hedge, R.M., Murthy, H.A., Rao, G.V.R.: Application of the modified group delay function to speaker identification and discrimination. In: Proc. ICASSP 2004, vol. 1, pp. 517–520 (2004)
Padmanabhan, R., Parthasarathi, S., Murthy, H.: Robustness of phase based features for speaker recognition. In: Proc. Interspeech, pp. 2355–2358 (2009)
Kua, J., Epps, J., Ambikairajah, E., Choi, E.: LS regularization of group delay features for speaker recognition. In: Proc. Interspeech, pp. 2887–2890 (2009)
Nakagawa, S., Asakawa, K., Wang, L.: Speaker recognition by combining MFCC and phase information. In: Proc. InterSpeech, pp. 2005–2008 (2007)
Wang, L., Ohtsuka, S., Nakagawa, S.: High improvement of speaker identification and verification by combining MFCC and phase information. In: Proc. ICASSP, pp. 4529–4532 (2009)
Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: Proc. ICASSP, pp. 4502–4505 (2010)
Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Transactions on Information and Systems E93-Dd(9), 2397–2406 (2010)
Hirano, Y., Wang, L., Kai, A., Nakagawa, S.: On the Use of Phase Information-based Joint Factor Analysis for Speaker Verification under Channel Mismatch Condition. In: Proc. of APSIPA ASC 2012, 4 pages (2012)
Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker Identification and Verification by Combining MFCC and Phase Information. IEEE Trans. on Audio, Speech, and Language Processing 20(4), 1085–1095 (2012)
Shimada, K., Yamamoto, K., Nakagawa, S.: Speaker identification using pseudo pitch/synchronized phase information in voiced sound. In: Proc. APSIPA ASC 2011, pp. 1–6 (2011)
Kawakami, Y., Wang, L., Nakagawa, S.: Speaker Identification Using Pseudo Pitch Synchronized Phase Information in Noisy Environments. In: Proc. APSIPA ASC 2012, 5 pages (2013)
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS:Japanese speech coupus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20(13), 199–206 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kawakami, Y., Wang, L., Kai, A., Nakagawa, S. (2014). Speaker Identification by Combining Various Vocal Tract and Vocal Source Features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)