Abstract
In this paper, physiological biometrics from face are combined with behavioral biometrics from speech in video to achieve robust user authentication. The choice of biometrics is motivated by user convenience and robustness to forgery as it is hard to simultaneously forge these two biometrics. We used the Mel Frequency Cepstral Coefficients for text-independent speaker recognition and local scale invariant features for video-based face recognition. Results of the two classifiers were fused using a weighted sum rule and an equal error rate of 0.6% was achieved on the VidTIMIT audio-visual database. We also performed identification experiments and achieved a combined identification rate of 99.13% on the same database.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Furui, S.: An Overview of Speaker Recognition Technology. In: ESCA Workshop on Automatic Speaker Recognition, Identification and Verification (1994)
Pawlewski, M., Jones, J.: Speaker Verification: Part 1. Biometric Technology Today 14(6), 9–11 (2006)
Reynolds, D.: A Gaussian Mixture Modeling Approach to Text-independent Speaker Identification. PhD Thesis, Georgia Institute of Technology (1992)
McLachlan, G.: Mixture Models, vol. Wright, J. and Yang, A. and Ganesh, A. and Sastri, S, S. and Ma, Y. Marcel Dekker, New York (1988)
Tishby, N.: On the Application of Mixture AR Hidden Markov Models to Text-independent Speaker Recognition. IEEE Trans. on Signal Proc. 39, 563–570 (1991)
Poritz, A.: Linear Predictive Hidden Markov Models and the Speech Signal. In: Proceedings of IEEE ICASSP, pp. 1291–1294 (1982)
Rosenberg, A.: Sub-word Talker Verification using Hidden Markov Models. In: Proceeding of IEEE ICASSP, pp. 269–272 (1990)
Levinson, D.: A Perspective on Speech Recognition. Communication Magazine 28 (1990)
Kohata, M.: Interpolation of LSP Coefficients using Recurrent Neural Networks. Electronics Letters 32 (1996)
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: A Literature Survey. ACM Computing Survey 35(4), 399–458 (2003)
Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience 3, 71–86 (1991)
Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. on PAMI 19, 711–720 (1997)
Lee, K., Ho, J., Yang, M., Kriegman, D.: Visual Tracking and Recognition Using Probabilistic Appearance Manifolds. CVIU 99(3), 303–331 (2005)
Liu, L., Wang, Y., Tan, T.: Online Appearance Model Learning for Video-Based Face Recognition. In: CVPR, pp. 1–7 (2007)
Lee, K., Kriegman, D.: Online Learning of Probabilistic Appearance Manifolds for Video-based Recognition and Tracking. In: CVPR, vol. 1, pp. 852–859 (2005)
Li, Y., Gong, S., Liddell, H.: Constructing Facial Identity Surfaces in a Nonlinear Discriminating Space. In: CVPR, vol. 2, pp. 258–263 (2001)
Sivic, J., Everingham, M., Zisserman, A.: Person Spotting: Video Shot Retrieval for Face Sets. In: CIVR (2005)
Sanderson, C., Paliwal, K.: Identity Verification Using Speech and Face Information. Digital Signal Processing 14(5), 449–480 (2004)
Sanderson, C.: Biometric Person Recognition: Face, Speech and Fusion. VDM-Verlag (2008)
Moore, B.: Information Extraction and Perceptual Grouping in the Auditory System. Human and Machine Perception: Information Fusion (1997)
Haung, X., Acero, A., Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, New Jersey (2001)
Moore, B.: Frequency Analysis and Masking. Academic Press, USA (1995)
Bimbot, F., Magrin-Chagnolleau, I., Mathan, L.: Second-order Statistical Measures for Text-independent Speaker Identification. Speech Communication 17, 177–192 (1995)
Viola, P., Jones, M.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Lowe, D.: Distinctive Image Features from Scale-invariant Key Points. International Journal of Computer Vision 60(2), 91–110 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Naseem, I., Mian, A. (2008). User Verification by Combining Speech and Face Biometrics in Video. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2008. Lecture Notes in Computer Science, vol 5359. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89646-3_47
Download citation
DOI: https://doi.org/10.1007/978-3-540-89646-3_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89645-6
Online ISBN: 978-3-540-89646-3
eBook Packages: Computer ScienceComputer Science (R0)