Skip to main content

Integrating Complementary Features with a Confidence Measure for Speaker Identification

  • Conference paper
Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

  • 1569 Accesses

Abstract

This paper investigates the effectiveness of integrating complementary acoustic features for improved speaker identification performance. The complementary contributions of two acoustic features, i.e. the conventional vocal tract related features MFCC and the recently proposed vocal source related features WOCOR, for speaker identification are studied. An integrating system, which performs a score level fusion of MFCC and WOCOR with a confidence measure as the weighting parameter, is proposed to take full advantage of the complementarity between the two features. The confidence measure is derived based on the speaker discrimination powers of MFCC and WOCOR in each individual identification trial so as to give more weight to the one with higher confidence in speaker discrimination. Experiments show that information fusion with such a confidence measure based varying weight outperforms that with a pre-trained fixed weight in speaker identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  2. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixturemodels. Digital Signal Processing 10(1-3), 19–41 (2000)

    Article  Google Scholar 

  3. Sonmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: A lognormal tied mixture model of pitch for prosody based speaker recognition. In: Proc. Eurospeech, pp. 1391–1394 (1997)

    Google Scholar 

  4. Imperl, B., Kacic, Z., Horvat, B.: A study of harmonic features for speaker recognition. Speech Communication 22(4), 385–402 (1997)

    Article  Google Scholar 

  5. Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech Audio Processing 7(5), 569–585 (1999)

    Article  Google Scholar 

  6. Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones1, D., Xiang, B.: The SuperSID project: Exploiting highlevel information for high-accuracy speaker recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, pp. 784–787 (2003)

    Google Scholar 

  7. Zheng, N.H., Ching, P.C., Lee, T.: Time frequency analysis of vocal source signal for speaker recognition. In: Proc. Int. Conf. on Spoken Language Processing, pp. 2333–2336 (2004)

    Google Scholar 

  8. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs (1978)

    Google Scholar 

  9. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing 28(4), 357–366 (1980)

    Article  Google Scholar 

  10. Ross, A., Jain, A., Qian, J.-Z.: Information fusion in biometrics. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 354–359. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  11. Garcia-Romero, D., Fierrez-Aguilar, J., Gonzalez-Rodriguez, J., Garcia, J.O.: On the use of quality measures for text-independent speaker recognition. In: ESCA Workshop on Speaker and Language Recognition, Odyssey, pp. 105–110 (2004)

    Google Scholar 

  12. Toh, K.-A., Yau, W.-Y.: Fingerprint and speaker verification decisions fusion using a functional link network. IEEE Trans. System, Man and Cybernetics B 35(3), 357–370 (2005)

    Article  Google Scholar 

  13. Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis. Elsevier, Amsterdam (1995)

    Google Scholar 

  14. Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)

    Google Scholar 

  15. Zheng, N.H., Qin, C., Lee, T., Ching, P.C.: CU2C: A dual-condition Cantonese speech database for speaker recognition applications. In: Proc. Oriental- COCOSDA, pp. 67–72 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zheng, N., Ching, P.C., Wang, N., Lee, T. (2006). Integrating Complementary Features with a Confidence Measure for Speaker Identification. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_57

Download citation

  • DOI: https://doi.org/10.1007/11939993_57

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics