Skip to main content

Bimodal Speaker Identification Using Dynamic Bayesian Network

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3338))

Abstract

The authentication of a person requires a consistently high recognition accuracy which is difficult to attain using a single recognition modality. This paper assesses the fusion of voiceprint and face feature for bimodal speaker identification using Dynamic Bayesian Network (DBN). Our contribution is to propose a general feature-level fusion framework in bimodal speaker identification. Within the framework, the voice and face feature are combined into a single DBN to obtain better performance than any single system alone. The tests were conducted on a multi-modal database of 54 users who provided voiceprint and face data of different speech type and content .We compare our approach with mono-modal system and other classic decision-level methods and show that feature-level fusion using dynamic Bayesian network improved performance by about 4-5%, much better than the others.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Duc, B., et al.: Fusion of audio and video information for multimodal person authentication. Pattern Recognition Letters 18, 835–843 (1997)

    Article  Google Scholar 

  2. Verlinde., P., Chollet, G.: Comparing decision fusion paradigms using k-NN based classifiers. decision trees and logistic regression in a multi-modal identity verification application. In: Proc. 2nd Int.l Conf. on Audio- and Video-Based Biometric Person Authentication, Washingtion D.C, pp. 188–193 (1999)

    Google Scholar 

  3. Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of face and speech data for person identity verification. In: IEEE Transactions on Neural Networks, pp. 1065–1074 (1999)

    Google Scholar 

  4. Luettin, J., Ben-Yacoub, S.: Robust Person Verification based on Speech and Facial Images. In: Proceedings of the European Conference on Speech Communication and Technology (1999)

    Google Scholar 

  5. Roli, F., Kittler, J., Fumera, G., Muntoni, D.: An Experimental Comparison of Classifier Fusion Rules for Multimodal Personal Identity Verification Systems. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 325–336. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Roli, F., Raudys, S., Marcialis, G.L.: An experimental comparison of fixed and trained fusion rules for crisp classifier outputs. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 232. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Roli, F., Fumera, G.: Analysis of linear and order statistics combiners for fusion of imbalanced classifiers. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 252. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Chibelushi, C.C., Mason, J.S.D., Deravi, F.: Feature-level data fusion for bimodal person recognition. In: 6th International Conference on Image Processing and its Applications, vol. 1, pp. 399–403 (1997)

    Google Scholar 

  9. Murphy. K.: Dynamic Bayesian Networks: Representation. Inference and Learning. Ph.D. thesis. U.C. Berkeley (2002)

    Google Scholar 

  10. Vergin, R., O’Shaughnessy, D., Gupta, V.: Compensated mel frequency cepstrum coefficients. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, USA, vol. 1, pp. 323–326 (1996)

    Google Scholar 

  11. Wang, Y., Tan, T., Jain, A.K.: Combining Face and Iris Biometrics for Identity Verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 805–813. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Sang, L., Wu, Z., Yang, Y., Zhang, W.: Automatic Speaker Recognition Using Dynamic Bayesian Network. IEEE ICASSP 2003 1, 188–191 (2003)

    Google Scholar 

  13. Cowell, R.: Introduction to inference for Bayesiannetworks, Jordan, pp. 9–26 (1999)

    Google Scholar 

  14. Stephenson, T.A., Escofet, J., Magimai-Doss, M., Bourlard, H.: Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. In: 2002 IEEE International Workshop on Neural Networks for for Signal Processing (NNSP 2002), Martigny, Switzerland, pp. 637–646 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, D., Sang, L., Yang, Y., Wu, Z. (2004). Bimodal Speaker Identification Using Dynamic Bayesian Network. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds) Advances in Biometric Person Authentication. SINOBIOMETRICS 2004. Lecture Notes in Computer Science, vol 3338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30548-4_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30548-4_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24029-7

  • Online ISBN: 978-3-540-30548-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics