Abstract
This paper investigates the effectiveness of integrating complementary acoustic features for improved speaker identification performance. The complementary contributions of two acoustic features, i.e. the conventional vocal tract related features MFCC and the recently proposed vocal source related features WOCOR, for speaker identification are studied. An integrating system, which performs a score level fusion of MFCC and WOCOR with a confidence measure as the weighting parameter, is proposed to take full advantage of the complementarity between the two features. The confidence measure is derived based on the speaker discrimination powers of MFCC and WOCOR in each individual identification trial so as to give more weight to the one with higher confidence in speaker discrimination. Experiments show that information fusion with such a confidence measure based varying weight outperforms that with a pre-trained fixed weight in speaker identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixturemodels. Digital Signal Processing 10(1-3), 19–41 (2000)
Sonmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: A lognormal tied mixture model of pitch for prosody based speaker recognition. In: Proc. Eurospeech, pp. 1391–1394 (1997)
Imperl, B., Kacic, Z., Horvat, B.: A study of harmonic features for speaker recognition. Speech Communication 22(4), 385–402 (1997)
Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech Audio Processing 7(5), 569–585 (1999)
Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones1, D., Xiang, B.: The SuperSID project: Exploiting highlevel information for high-accuracy speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, pp. 784–787 (2003)
Zheng, N.H., Ching, P.C., Lee, T.: Time frequency analysis of vocal source signal for speaker recognition. In: Proc. Int. Conf. on Spoken Language Processing, pp. 2333–2336 (2004)
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing 28(4), 357–366 (1980)
Ross, A., Jain, A., Qian, J.-Z.: Information fusion in biometrics. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 354–359. Springer, Heidelberg (2001)
Garcia-Romero, D., Fierrez-Aguilar, J., Gonzalez-Rodriguez, J., Garcia, J.O.: On the use of quality measures for text-independent speaker recognition. In: ESCA Workshop on Speaker and Language Recognition, Odyssey, pp. 105–110 (2004)
Toh, K.-A., Yau, W.-Y.: Fingerprint and speaker verification decisions fusion using a functional link network. IEEE Trans. System, Man and Cybernetics B 35(3), 357–370 (2005)
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis. Elsevier, Amsterdam (1995)
Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)
Zheng, N.H., Qin, C., Lee, T., Ching, P.C.: CU2C: A dual-condition Cantonese speech database for speaker recognition applications. In: Proc. Oriental- COCOSDA, pp. 67–72 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zheng, N., Ching, P.C., Wang, N., Lee, T. (2006). Integrating Complementary Features with a Confidence Measure for Speaker Identification. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_57
Download citation
DOI: https://doi.org/10.1007/11939993_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)