Bimodal Speaker Identification Using Dynamic Bayesian Network

Li, Dongdong; Sang, LiFeng; Yang, Yingchun; Wu, Zhaohui

doi:10.1007/978-3-540-30548-4_66

Bimodal Speaker Identification Using Dynamic Bayesian Network

Dongdong Li²¹,
LiFeng Sang²¹,
Yingchun Yang²¹ &
…
Zhaohui Wu²¹

Conference paper

2221 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3338))

Abstract

The authentication of a person requires a consistently high recognition accuracy which is difficult to attain using a single recognition modality. This paper assesses the fusion of voiceprint and face feature for bimodal speaker identification using Dynamic Bayesian Network (DBN). Our contribution is to propose a general feature-level fusion framework in bimodal speaker identification. Within the framework, the voice and face feature are combined into a single DBN to obtain better performance than any single system alone. The tests were conducted on a multi-modal database of 54 users who provided voiceprint and face data of different speech type and content .We compare our approach with mono-modal system and other classic decision-level methods and show that feature-level fusion using dynamic Bayesian network improved performance by about 4-5%, much better than the others.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Duc, B., et al.: Fusion of audio and video information for multimodal person authentication. Pattern Recognition Letters 18, 835–843 (1997)
Article Google Scholar
Verlinde., P., Chollet, G.: Comparing decision fusion paradigms using k-NN based classifiers. decision trees and logistic regression in a multi-modal identity verification application. In: Proc. 2nd Int.l Conf. on Audio- and Video-Based Biometric Person Authentication, Washingtion D.C, pp. 188–193 (1999)
Google Scholar
Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of face and speech data for person identity verification. In: IEEE Transactions on Neural Networks, pp. 1065–1074 (1999)
Google Scholar
Luettin, J., Ben-Yacoub, S.: Robust Person Verification based on Speech and Facial Images. In: Proceedings of the European Conference on Speech Communication and Technology (1999)
Google Scholar
Roli, F., Kittler, J., Fumera, G., Muntoni, D.: An Experimental Comparison of Classifier Fusion Rules for Multimodal Personal Identity Verification Systems. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 325–336. Springer, Heidelberg (2002)
Chapter Google Scholar
Roli, F., Raudys, S., Marcialis, G.L.: An experimental comparison of fixed and trained fusion rules for crisp classifier outputs. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 232. Springer, Heidelberg (2002)
Chapter Google Scholar
Roli, F., Fumera, G.: Analysis of linear and order statistics combiners for fusion of imbalanced classifiers. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, p. 252. Springer, Heidelberg (2002)
Chapter Google Scholar
Chibelushi, C.C., Mason, J.S.D., Deravi, F.: Feature-level data fusion for bimodal person recognition. In: 6th International Conference on Image Processing and its Applications, vol. 1, pp. 399–403 (1997)
Google Scholar
Murphy. K.: Dynamic Bayesian Networks: Representation. Inference and Learning. Ph.D. thesis. U.C. Berkeley (2002)
Google Scholar
Vergin, R., O’Shaughnessy, D., Gupta, V.: Compensated mel frequency cepstrum coefficients. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, USA, vol. 1, pp. 323–326 (1996)
Google Scholar
Wang, Y., Tan, T., Jain, A.K.: Combining Face and Iris Biometrics for Identity Verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 805–813. Springer, Heidelberg (2003)
Chapter Google Scholar
Sang, L., Wu, Z., Yang, Y., Zhang, W.: Automatic Speaker Recognition Using Dynamic Bayesian Network. IEEE ICASSP 2003 1, 188–191 (2003)
Google Scholar
Cowell, R.: Introduction to inference for Bayesiannetworks, Jordan, pp. 9–26 (1999)
Google Scholar
Stephenson, T.A., Escofet, J., Magimai-Doss, M., Bourlard, H.: Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. In: 2002 IEEE International Workshop on Neural Networks for for Signal Processing (NNSP 2002), Martigny, Switzerland, pp. 637–646 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Zhejiang University, Hang Zhou, P.R. China
Dongdong Li, LiFeng Sang, Yingchun Yang & Zhaohui Wu

Authors

Dongdong Li
View author publications
You can also search for this author in PubMed Google Scholar
LiFeng Sang
View author publications
You can also search for this author in PubMed Google Scholar
Yingchun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohui Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Biometrics and Security Research & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences,
Stan Z. Li
Department of Electronics & Communication Engineering, Sun Yat-Sen University, Guangzhou, China
Jianhuang Lai
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Center of Computer Vision, School of Mathematics and Computing Science, Sun Yat-sen University, 510275, Guangzhou, China
Guocan Feng
School of Computer Science and Engineering, Beihang University, Beijing, China
Yunhong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, D., Sang, L., Yang, Y., Wu, Z. (2004). Bimodal Speaker Identification Using Dynamic Bayesian Network. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds) Advances in Biometric Person Authentication. SINOBIOMETRICS 2004. Lecture Notes in Computer Science, vol 3338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30548-4_66

Download citation

DOI: https://doi.org/10.1007/978-3-540-30548-4_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24029-7
Online ISBN: 978-3-540-30548-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics