Skip to main content
Log in

Automatic person recognition by acoustic and geometric features

  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper describes a multisensorial person-identification system in which visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker-recognition system and a face-recognition system, is presented. Experiments are reported that show that the integration of visual and acoustic information enhances both the performance and the reliability of the separate systems. Finally, two network architectures, based on radial basis-function theory, are proposed to describe integration at various levels of abstraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Ata B (1976) Automatic recognition of speakers from their voices. Proc IEEE 64:460–475

    Google Scholar 

  2. Baron RJ (1981) Mechanisms of human facial recognition. Int J Man Machine Stud 15:137–178

    Google Scholar 

  3. Bichsel M (1991) Strategies of robust object recognition for the identification of human faces. Phd Thesis, The Swiss Federal Institute of Technology, Zürich

    Google Scholar 

  4. Brunelli R, Poggio T (1992) Face recognition through geometrical features. In: Sandini G (ed) ECCV'92, Santa Margherita Ligure. Springer, Berlin Heidelberg New York, pp 792–800

    Google Scholar 

  5. Brunelli R, Poggio T (1993) Face recognition: features versus templates. IEEE trans Patt Anal Machine Intell 15:1042–1052

    Google Scholar 

  6. Cottrell G, Fleming M (1990) Face recognition using unsupervised feature extraction Proceedings of the International Neural Network Conference, Paris, Kluwer, July, pp 322–335

  7. Craw I, Ellis H, Lishman JR (1987) Automatic extractoin of face features. Patt Recogn Lett 5:183–187

    Google Scholar 

  8. Davis SB, Melmerstein P (1980) Comparison of parametric representation for monosyllabic word recognition in continuosly spoken sentences IEEE Trans Acoustic Speech Signal Processing 28:357–366

    Google Scholar 

  9. Doddington GR (1985) Speaker recognition, identifying people by their voices. Proc IEEE 73

  10. Duda RO, Hart PE (1973) Pattern recognition and scene analysis. Wiley, New York

    Google Scholar 

  11. Furui S (1981) Cepstrum analysis technique for automatic speaker verification. IEEE Trans Acoustic Speech Signal Processing 29:254–272

    Google Scholar 

  12. Kanade T (1973) Picture processing by comoputer complex and recognition of human faces. Technical Report, Department of Information Science, Kyoto University, Kyoto, Japan

    Google Scholar 

  13. Makhoul J, Gish H, Roucos S (1985) Vector quantization in speech coding. Proc IEEE 73:1551–1588

    Google Scholar 

  14. Nakamura O, Mathur S, Minami T (1991) Identification of human faces based on isodensity maps. Patt Recogn 24:263–272

    Google Scholar 

  15. Poggio T, Edelman S (1990) A network that learns to recognize three-dimensional objects. Nature 343:1–3

    Google Scholar 

  16. Poggio T, Girosi F (1989) A theory of networks for approximation and learning. Artificial Intelligence Lab A.I. Memo No. 1140, Massachusetts Institute of Technology, Boston, Mass

    Google Scholar 

  17. Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78:1481–1497

    Google Scholar 

  18. Poggio T, Stringa L (1992) A project for an intelligent system: vision and learning. Int J Quantum Chem 42:727–739

    Google Scholar 

  19. Rosenberg AE, Soong FK (1987) Evaluation of a vector quantization talker recognition system in text independent and text dependent modes. Comput Speech Language 2:143–157

    Google Scholar 

  20. Stringa L (1991a) An integrated approach to artificial intelligence: the MAIA Project. Technical Report 9110-26, Institute for Scientific and Technological Research, Trento, Italy

    Google Scholar 

  21. Stringa L (1991b) Automatic Face Recognition using Directinal Derivatives. Technical Report 9205-04,1.R.S.T. Institute for Scientific and Technological Research, Trento, Italy

    Google Scholar 

  22. Stringa L (1992) S-net implementation of a face recognizer based on directional derivatives. In: Caianiello ER (ed) Proceedings of the 5th Italian Workshop on Neural Nets, Vietri, World Scientific, Singapore, pp 329–333

    Google Scholar 

  23. Stringa L (1993) Eyes detection nfor face recognition. Appl Artif Intell 7:365–382

    Google Scholar 

  24. Tishby NZ (1991) On the application of mixture AR hidden markov models to text independent speaker recognition. IEEE Trans Signal Processing 39:563–570

    Google Scholar 

  25. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognitive Neurosci 3:71–86

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Italian Patent No. TO92A000695. European extension in progress.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brunelli, R., Falavigna, D., Poggio, T. et al. Automatic person recognition by acoustic and geometric features. Machine Vis. Apps. 8, 317–325 (1995). https://doi.org/10.1007/BF01211493

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01211493

Key words

Navigation