Abstract
This paper describes a multisensorial person-identification system in which visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker-recognition system and a face-recognition system, is presented. Experiments are reported that show that the integration of visual and acoustic information enhances both the performance and the reliability of the separate systems. Finally, two network architectures, based on radial basis-function theory, are proposed to describe integration at various levels of abstraction.
Similar content being viewed by others
References
Ata B (1976) Automatic recognition of speakers from their voices. Proc IEEE 64:460–475
Baron RJ (1981) Mechanisms of human facial recognition. Int J Man Machine Stud 15:137–178
Bichsel M (1991) Strategies of robust object recognition for the identification of human faces. Phd Thesis, The Swiss Federal Institute of Technology, Zürich
Brunelli R, Poggio T (1992) Face recognition through geometrical features. In: Sandini G (ed) ECCV'92, Santa Margherita Ligure. Springer, Berlin Heidelberg New York, pp 792–800
Brunelli R, Poggio T (1993) Face recognition: features versus templates. IEEE trans Patt Anal Machine Intell 15:1042–1052
Cottrell G, Fleming M (1990) Face recognition using unsupervised feature extraction Proceedings of the International Neural Network Conference, Paris, Kluwer, July, pp 322–335
Craw I, Ellis H, Lishman JR (1987) Automatic extractoin of face features. Patt Recogn Lett 5:183–187
Davis SB, Melmerstein P (1980) Comparison of parametric representation for monosyllabic word recognition in continuosly spoken sentences IEEE Trans Acoustic Speech Signal Processing 28:357–366
Doddington GR (1985) Speaker recognition, identifying people by their voices. Proc IEEE 73
Duda RO, Hart PE (1973) Pattern recognition and scene analysis. Wiley, New York
Furui S (1981) Cepstrum analysis technique for automatic speaker verification. IEEE Trans Acoustic Speech Signal Processing 29:254–272
Kanade T (1973) Picture processing by comoputer complex and recognition of human faces. Technical Report, Department of Information Science, Kyoto University, Kyoto, Japan
Makhoul J, Gish H, Roucos S (1985) Vector quantization in speech coding. Proc IEEE 73:1551–1588
Nakamura O, Mathur S, Minami T (1991) Identification of human faces based on isodensity maps. Patt Recogn 24:263–272
Poggio T, Edelman S (1990) A network that learns to recognize three-dimensional objects. Nature 343:1–3
Poggio T, Girosi F (1989) A theory of networks for approximation and learning. Artificial Intelligence Lab A.I. Memo No. 1140, Massachusetts Institute of Technology, Boston, Mass
Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78:1481–1497
Poggio T, Stringa L (1992) A project for an intelligent system: vision and learning. Int J Quantum Chem 42:727–739
Rosenberg AE, Soong FK (1987) Evaluation of a vector quantization talker recognition system in text independent and text dependent modes. Comput Speech Language 2:143–157
Stringa L (1991a) An integrated approach to artificial intelligence: the MAIA Project. Technical Report 9110-26, Institute for Scientific and Technological Research, Trento, Italy
Stringa L (1991b) Automatic Face Recognition using Directinal Derivatives. Technical Report 9205-04,1.R.S.T. Institute for Scientific and Technological Research, Trento, Italy
Stringa L (1992) S-net implementation of a face recognizer based on directional derivatives. In: Caianiello ER (ed) Proceedings of the 5th Italian Workshop on Neural Nets, Vietri, World Scientific, Singapore, pp 329–333
Stringa L (1993) Eyes detection nfor face recognition. Appl Artif Intell 7:365–382
Tishby NZ (1991) On the application of mixture AR hidden markov models to text independent speaker recognition. IEEE Trans Signal Processing 39:563–570
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognitive Neurosci 3:71–86
Author information
Authors and Affiliations
Additional information
Italian Patent No. TO92A000695. European extension in progress.
Rights and permissions
About this article
Cite this article
Brunelli, R., Falavigna, D., Poggio, T. et al. Automatic person recognition by acoustic and geometric features. Machine Vis. Apps. 8, 317–325 (1995). https://doi.org/10.1007/BF01211493
Issue Date:
DOI: https://doi.org/10.1007/BF01211493