Robust Automatic Human Identification Using Face, Mouth, and Acoustic Information

Fox, Niall A.; Gross, Ralph; Cohn, Jeffrey F.; Reilly, Richard B.

doi:10.1007/11564386_21

Niall A. Fox¹⁹,
Ralph Gross²⁰,
Jeffrey F. Cohn²⁰ &
…
Richard B. Reilly¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3723))

Included in the following conference series:

International Workshop on Analysis and Modeling of Faces and Gestures

Abstract

Discriminatory information about person identity is multimodal. Yet, most person recognition systems are unimodal, e.g. the use of facial appearance. With a view to exploiting the complementary nature of different modes of information and increasing pattern recognition robustness to test signal degradation, we developed a multiple expert biometric person identification system that combines information from three experts: face, visual speech, and audio. The system uses multimodal fusion in an automatic unsupervised manner, adapting to the local performance and output reliability of each of the experts. The expert weightings are chosen automatically such that the reliability measure of the combined scores is maximized. To test system robustness to train/test mismatch, we used a broad range of Gaussian noise and JPEG compression to degrade the audio and visual signals, respectively. Experiments were carried out on the XM2VTS database. The multimodal expert system out performed each of the single experts in all comparisons. At severe audio and visual mismatch levels tested, the audio, mouth, face, and tri-expert fusion accuracies were 37.1%, 48%, 75%, and 92.7% respectively, representing a relative improvement of 23.6% over the best performing expert.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multimodal Biometric Recognition System Based on Nonparametric Classifiers

Multimodal biometric scheme for human authentication technique based on voice and face recognition fusion

Article 15 December 2018

Biometric Fusion System Using Face and Voice Recognition

References

Blackburn, D., Bone, M., Phillips, P.J.: Facial Recognition Vendor Test 2000. Evaluation report (2000)
Google Scholar
Gross, R., Shi, J., Cohn, J.F.: Quo Vadis Face Recognition. In: Third Workshop on Empirical Evaluation Methods in Computer Vision (2001)
Google Scholar
Fox, N.A., Reilly, R.B.: Audio-Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features. In: Proc. of the fourth Int’l Conf. on Audio- and Video-Based Biometric Person Authentication, Guildford, UK, pp. 743–751 (2003)
Google Scholar
Identix Corp., 5600 Rowland Road, Minnetonka, MN 55343, http://www.identix.com
Dieckmann, U., Plankensteiner, P., Wagner, T.: SESAM: A biometric person identification system using sensor fusion. Pattern Recognition Letters 18, 827–833 (1997)
Article Google Scholar
Yemez, Y., Kanak, A., Erzin, E., Tekalp, A.M.: Multimodal Speaker Identification with Audio-video Processing. In: Proc. of the Int’l Conf. on Image Processing, vol. 3, pp. 5–8 (2003)
Google Scholar
Frischholz, R.W., Dieckmann, U.: BiolD: a multimodal biometric identification system. Computer 33, 64–68 (2000)
Article Google Scholar
Sanderson, C., Paliwal, K.K.: Identity verification using speech and face information. Digital Signal Processing 14, 449–480 (2004)
Article Google Scholar
Wark, T., Sridharan, S.: Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification. Digital Signal Processing 11, 169–186 (2001)
Article Google Scholar
Fox, N.A., Reilly, R.B.: Robust Multi-modal Person Identification with Tolerance of Facial Expression. In: The Proc. of the IEEE Int’l Conf. on Systems, Man and Cybernetics, vol. 1, pp. 580–585. The Hague, The Netherlands (2004)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Tran. on Speech and Audio Processing 3, 72–83 (1995)
Article Google Scholar
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.1). Cambridge University Engineering Department: Microsoft Corporation (2001)
Google Scholar
Lucey, S., Chen, T., Sridharan, S., Chandran, V.: Integration strategies for audio-visual speech processing: Applied to text dependent speaker recognition. To appear in the IEEE Transactions on Multimedia, vol. 7 (2005)
Google Scholar
Potamianos, G., Graf, H., Cosatto, E.: An Image Transform Approach for HMM Based Automatic Lipreading. In: Proc. of the IEEE Int’l Conf. Image Processing, Chicago, vol. 3, pp. 173–177 (1998)
Google Scholar
Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A Comparison of Model and Transform-based Visual Features for Audio-Visual LVCSR. In: Proc. of the IEEE Int’l Conf. on Multimedia and Expo., pp. 825–828 (2001)
Google Scholar
Fox, N.A., Gross, R., de Chazal, P., Cohn, J.F., Reilly, R.B.: Person Identification Using Automatic Integration of Speech, Lip, and Face Experts. In: ACM SIGMM workshop on Biometrics Methods and Applications, Berkley, CA, pp. 25–32 (2003)
Google Scholar
Fox, N.A., O’Mullane, B.A., Reilly, R.B.: Audio-Visual Speaker Identification via Automatic Fusion using Reliability Estimates of both Modalities. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 787–796. Springer, Heidelberg (2005)
Chapter Google Scholar
Vapnik, V.: The nature of statistical learning theory. Springer, Heidelberg (1995)
MATH Google Scholar
Messer, K., Kittler, J., Luettin, J., Maitre, G.: XM2VTSDB: The Extended M2VTS Database. In: The Proc. of the Second Int’l Conf. on Audio and Video-based Biometric Person Authentication, Washington D.C., pp. 72–77 (1999)
Google Scholar
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 711–720 (1997)
Article Google Scholar
Sirovich, L., Kirby, M.: Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A 4, 519–524 (1987)
Article Google Scholar
Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience 3, 71–86 (1991)
Article Google Scholar
Li, Y., Gong, S., Liddell, H.: Support vector regression and classification based multi-view face detection and recognition. In: Proc. of the Fourth IEEE Int’l Conf. on Automatic Face and Gesture Recognition, pp. 300–305 (2000)
Google Scholar
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Tran. on Neural Networks 8, 98–113 (1997)
Article Google Scholar
Lanitis, A., Taylor, C.J., Cootes, T.F.: Automatic interpretation and coding of face images using flexible models. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 743–756 (1997)
Article Google Scholar
Yuille, A.: Deformable Templates for Face Recognition. Journal of Cognitive Neuroscience 3, 59–70 (1991)
Article Google Scholar
Wiskott, L., Fellous, J.-M., Kuiger, N., von der Malsburg, C.: Face recognition by elastic bunch graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 775–779 (1997)
Article Google Scholar
Penev, P., Atick, J.: Local feature analysis: A general statistical theory for object representation. Network: Computation in Neural Systems 7, 477–500 (1996)
Article MATH Google Scholar
Phillips, P.J., Grother, P., Michaels, P., Blackburn, D., Tabassi, E., Bone, M.: Face Recognition Vendor Test 2002, Evaluation report (2002)
Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 226–239 (1998)
Article Google Scholar
Jain, A., Nandakumar, K., Ross, A.: Score Normalization in Multimodal Biometric Systems. To appear in Pattern Recognition (2005)
Google Scholar
Heckmann, M., Berthommier, F., Kristian, K.: Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition. EURASIP Journal on Applied Signal Processing 2002, 1260–1273 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland
Niall A. Fox & Richard B. Reilly
Robotics Institute, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213
Ralph Gross & Jeffrey F. Cohn

Authors

Niall A. Fox
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Gross
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey F. Cohn
View author publications
You can also search for this author in PubMed Google Scholar
Richard B. Reilly
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intuitive Surgical Inc, 950 Kifer Road, 94086, Sunnyvale, CA, USA
Wenyi Zhao
Shaogang Gong, Department of Computer Science, Queen Mary, University of London, E1 4NS, London, UK
Shaogang Gong
Microsoft Research Asia, P.O. Box, Beijing, P.R. China
Xiaoou Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fox, N.A., Gross, R., Cohn, J.F., Reilly, R.B. (2005). Robust Automatic Human Identification Using Face, Mouth, and Acoustic Information. In: Zhao, W., Gong, S., Tang, X. (eds) Analysis and Modelling of Faces and Gestures. AMFG 2005. Lecture Notes in Computer Science, vol 3723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564386_21

Download citation

DOI: https://doi.org/10.1007/11564386_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29229-6
Online ISBN: 978-3-540-32074-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics