User Verification by Combining Speech and Face Biometrics in Video

Naseem, Imran; Mian, Ajmal

doi:10.1007/978-3-540-89646-3_47

Imran Naseem²⁸ &
Ajmal Mian²⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5359))

Included in the following conference series:

International Symposium on Visual Computing

Abstract

In this paper, physiological biometrics from face are combined with behavioral biometrics from speech in video to achieve robust user authentication. The choice of biometrics is motivated by user convenience and robustness to forgery as it is hard to simultaneously forge these two biometrics. We used the Mel Frequency Cepstral Coefficients for text-independent speaker recognition and local scale invariant features for video-based face recognition. Results of the two classifiers were fused using a weighted sum rule and an equal error rate of 0.6% was achieved on the VidTIMIT audio-visual database. We also performed identification experiments and achieved a combined identification rate of 99.13% on the same database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Biometric Fusion System Using Face and Voice Recognition

A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream Using Machine Learning Classification Techniques

A Smart Security System Using Multimodal Features from Videos

Article 01 January 2019

References

Furui, S.: An Overview of Speaker Recognition Technology. In: ESCA Workshop on Automatic Speaker Recognition, Identification and Verification (1994)
Google Scholar
Pawlewski, M., Jones, J.: Speaker Verification: Part 1. Biometric Technology Today 14(6), 9–11 (2006)
Article Google Scholar
Reynolds, D.: A Gaussian Mixture Modeling Approach to Text-independent Speaker Identification. PhD Thesis, Georgia Institute of Technology (1992)
Google Scholar
McLachlan, G.: Mixture Models, vol. Wright, J. and Yang, A. and Ganesh, A. and Sastri, S, S. and Ma, Y. Marcel Dekker, New York (1988)
MATH Google Scholar
Tishby, N.: On the Application of Mixture AR Hidden Markov Models to Text-independent Speaker Recognition. IEEE Trans. on Signal Proc. 39, 563–570 (1991)
Article Google Scholar
Poritz, A.: Linear Predictive Hidden Markov Models and the Speech Signal. In: Proceedings of IEEE ICASSP, pp. 1291–1294 (1982)
Google Scholar
Rosenberg, A.: Sub-word Talker Verification using Hidden Markov Models. In: Proceeding of IEEE ICASSP, pp. 269–272 (1990)
Google Scholar
Levinson, D.: A Perspective on Speech Recognition. Communication Magazine 28 (1990)
Google Scholar
Kohata, M.: Interpolation of LSP Coefficients using Recurrent Neural Networks. Electronics Letters 32 (1996)
Google Scholar
Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face Recognition: A Literature Survey. ACM Computing Survey 35(4), 399–458 (2003)
Article Google Scholar
Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience 3, 71–86 (1991)
Article Google Scholar
Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. on PAMI 19, 711–720 (1997)
Article Google Scholar
Lee, K., Ho, J., Yang, M., Kriegman, D.: Visual Tracking and Recognition Using Probabilistic Appearance Manifolds. CVIU 99(3), 303–331 (2005)
Google Scholar
Liu, L., Wang, Y., Tan, T.: Online Appearance Model Learning for Video-Based Face Recognition. In: CVPR, pp. 1–7 (2007)
Google Scholar
Lee, K., Kriegman, D.: Online Learning of Probabilistic Appearance Manifolds for Video-based Recognition and Tracking. In: CVPR, vol. 1, pp. 852–859 (2005)
Google Scholar
Li, Y., Gong, S., Liddell, H.: Constructing Facial Identity Surfaces in a Nonlinear Discriminating Space. In: CVPR, vol. 2, pp. 258–263 (2001)
Google Scholar
Sivic, J., Everingham, M., Zisserman, A.: Person Spotting: Video Shot Retrieval for Face Sets. In: CIVR (2005)
Google Scholar
Sanderson, C., Paliwal, K.: Identity Verification Using Speech and Face Information. Digital Signal Processing 14(5), 449–480 (2004)
Article Google Scholar
Sanderson, C.: Biometric Person Recognition: Face, Speech and Fusion. VDM-Verlag (2008)
Google Scholar
Moore, B.: Information Extraction and Perceptual Grouping in the Auditory System. Human and Machine Perception: Information Fusion (1997)
Google Scholar
Haung, X., Acero, A., Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, New Jersey (2001)
Google Scholar
Moore, B.: Frequency Analysis and Masking. Academic Press, USA (1995)
Book Google Scholar
Bimbot, F., Magrin-Chagnolleau, I., Mathan, L.: Second-order Statistical Measures for Text-independent Speaker Identification. Speech Communication 17, 177–192 (1995)
Article Google Scholar
Viola, P., Jones, M.: Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Article Google Scholar
Lowe, D.: Distinctive Image Features from Scale-invariant Key Points. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical, Electronic and Computer Engineering, Australia
Imran Naseem
School of Computer Science and Software Engineering, The University of Western Australia, Australia
Ajmal Mian

Authors

Imran Naseem
View author publications
You can also search for this author in PubMed Google Scholar
Ajmal Mian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Nevada, Reno, USA
George Bebis
NASA Ames Research Center, Moffett Field, CA, USA
Richard Boyle
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Bahram Parvin
Desert Research Institute, Reno, NV, USA
Darko Koracin
Digital Image Research Center, Kingston University, London, UK
Paolo Remagnino
Mitsubishi Electric Research Laboratories, P.O. Box 02139, Cambridge, MA, USA
Fatih Porikli
Computer and Information Science and Engineering, University of Florida, P.O. Box, FL 32611-6120, Gainsville, USA
Jörg Peters
IBM T.J. Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
James Klosowski
128 Memorial Mall, Stewart B001, IN 47907, West Lafayette, USA
Laura Arns
Denver Museum of Nature and Space, 2001 Colorade Blvd. Denver,, CO 80205, USA
Yu Ka Chun
Departmen of Computer Science,, NC State University, Campus Box 8206, NC 27695-8206, Raleigh, USA
Theresa-Marie Rhyne
Los Alamos National Labs, P.O. Box 1663, NM 87545, Los Alamos, USA
Laura Monroe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Naseem, I., Mian, A. (2008). User Verification by Combining Speech and Face Biometrics in Video. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2008. Lecture Notes in Computer Science, vol 5359. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89646-3_47

Download citation

DOI: https://doi.org/10.1007/978-3-540-89646-3_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89645-6
Online ISBN: 978-3-540-89646-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics