Abstract
The authentication of a person requires a consistently high recognition accuracy which is difficult to attain using a single recognition modality. This paper assesses the fusion of voiceprint and face feature for bimodal speaker identification using Dynamic Bayesian Network (DBN). Our contribution is to propose a general feature-level fusion framework in bimodal speaker identification. Within the framework, the voice and face feature are combined into a single DBN to obtain better performance than any single system alone. The tests were conducted on a multi-modal database of 54 users who provided voiceprint and face data of different speech type and content .We compare our approach with mono-modal system and other classic decision-level methods and show that feature-level fusion using dynamic Bayesian network improved performance by about 4-5%, much better than the others.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Duc, B., et al.: Fusion of audio and video information for multimodal person authentication. Pattern Recognition Letters 18, 835–843 (1997)
Verlinde., P., Chollet, G.: Comparing decision fusion paradigms using k-NN based classifiers. decision trees and logistic regression in a multi-modal identity verification application. In: Proc. 2nd Int.l Conf. on Audio- and Video-Based Biometric Person Authentication, Washingtion D.C, pp. 188–193 (1999)
Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of face and speech data for person identity verification. In: IEEE Transactions on Neural Networks, pp. 1065–1074 (1999)
Luettin, J., Ben-Yacoub, S.: Robust Person Verification based on Speech and Facial Images. In: Proceedings of the European Conference on Speech Communication and Technology (1999)
Roli, F., Kittler, J., Fumera, G., Muntoni, D.: An Experimental Comparison of Classifier Fusion Rules for Multimodal Personal Identity Verification Systems. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 325–336. Springer, Heidelberg (2002)
Roli, F., Raudys, S., Marcialis, G.L.: An experimental comparison of fixed and trained fusion rules for crisp classifier outputs. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 232. Springer, Heidelberg (2002)
Roli, F., Fumera, G.: Analysis of linear and order statistics combiners for fusion of imbalanced classifiers. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 252. Springer, Heidelberg (2002)
Chibelushi, C.C., Mason, J.S.D., Deravi, F.: Feature-level data fusion for bimodal person recognition. In: 6th International Conference on Image Processing and its Applications, vol. 1, pp. 399–403 (1997)
Murphy. K.: Dynamic Bayesian Networks: Representation. Inference and Learning. Ph.D. thesis. U.C. Berkeley (2002)
Vergin, R., O’Shaughnessy, D., Gupta, V.: Compensated mel frequency cepstrum coefficients. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, USA, vol. 1, pp. 323–326 (1996)
Wang, Y., Tan, T., Jain, A.K.: Combining Face and Iris Biometrics for Identity Verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 805–813. Springer, Heidelberg (2003)
Sang, L., Wu, Z., Yang, Y., Zhang, W.: Automatic Speaker Recognition Using Dynamic Bayesian Network. IEEE ICASSP 2003 1, 188–191 (2003)
Cowell, R.: Introduction to inference for Bayesiannetworks, Jordan, pp. 9–26 (1999)
Stephenson, T.A., Escofet, J., Magimai-Doss, M., Bourlard, H.: Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. In: 2002 IEEE International Workshop on Neural Networks for for Signal Processing (NNSP 2002), Martigny, Switzerland, pp. 637–646 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, D., Sang, L., Yang, Y., Wu, Z. (2004). Bimodal Speaker Identification Using Dynamic Bayesian Network. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds) Advances in Biometric Person Authentication. SINOBIOMETRICS 2004. Lecture Notes in Computer Science, vol 3338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30548-4_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-30548-4_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24029-7
Online ISBN: 978-3-540-30548-4
eBook Packages: Computer ScienceComputer Science (R0)