Skip to main content
Log in

Multimodal identity tracking in a smart room

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

An Erratum to this article was published on 08 January 2009

Abstract

The automatic detection, tracking, and identification of multiple people in intelligent environments are important building blocks on which smart interaction systems can be designed. Those could be, e.g., gesture recognizers, head pose estimators or far-field speech recognizers and dialog systems. In this paper, we present a system which is capable of tracking multiple people in a smart room environment while inferring their identities in a completely automatic and unobtrusive way. It relies on a set of fixed and active cameras to track the users and get close-ups of their faces for identification, and on several microphone arrays to determine active speakers and steer the attention of the system. Information coming asynchronously from several sources, such as position updates from audio or visual trackers and identification events from identification modules, is fused at higher level to gradually refine the room’s situation model. The system has been trained on a small set of users and showed good performance at acquiring and keeping their identities in a smart room environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Khalaf R, Intille S (2001) Improving multiple people tracking using temporal consistency. MIT Department of Architecture House_n Project Technical Report Massachusetts Institute of Technology, Cambridge

  2. Lienhart R, Maydt J (2002). An extended set of Haar-like features for rapid object detection. IEEE ICIP 1:900–903

    Google Scholar 

  3. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Conference on computer vision and pattern recognition, Hawaii, December 9–14

  4. McCowan I, Gatica-Perez D, Bengio S, Lathoud G, Barnard M, Zhang D (2005) Automatic analysis of multimodal group actions in meetings. IEEE Trans Pattern Anal Mach Intell 27(3):305–317

    Article  Google Scholar 

  5. Stiefelhagen R (2002) Tracking focus of attention in meetings. In: IEEE International conference on multimodal interfaces—ICMI, Pittsburgh, pp 273–280

  6. Voit M, Nickel K, Stiefelhagen R (2005) Multi-view head pose estimation using neural networks. In: 2nd Workshop on face processing in video (FPiV’05), in association with IEEE 2nd Canadian conference on computer and robot vision, Victoria

  7. Choudhury T, Clarkson B, Jebara T, Pentland A, (1999). Multimodal person recognition using unconstrained audio and video. In: Second Conference on audio- and video-based biometric person authentication’99 (AVBPA’99), Washington DC, pp 176–181

  8. Yang J, Zhu X, Gross R, Kominek J, Pan Y, Waibe A (1999). Multimodal people ID for a multimedia meeting browser. In: Proceedings of the 7th ACM international conference on multimedia ‘99, Orlando

  9. Tsuruoka S, Yamaguchi T, Kato K, Yoshikawa T, Shinogi T (2001). A camera control based fuzzy behaviour recognition of lecturer for distance lecture. In: Proceedings of the 10th IEEE international conference on fuzzy systems, Melbourne

  10. Peixoto P, Batista J, Araujo H (1998). A surveillance system combining peripheral and foveated motion tracking. In: Proceedings of the 14th international conference on pattern recognition, vol 1, pp 574–577

  11. Hampapur A, Pankanti S, Senior A, Tian Y, Brown L, Bolle R (2003). Face cataloger: multi-scale imaging for relating identity to location. In: IEEE conference on advanced video and signal based surveillance (AVSS 2003), Miami, USA

  12. Stillman S, Tanawongsuwan R, Essa I (1998). A system for tracking and recognizing multiple people with multiple cameras. TR GIT-GVU-98-25, Georgia Institute of Technology, Graphics, Visualization, and Usability Center, Georgia

  13. Gehrig T, Nickel K, Ekenel HK, Klee U, McDonough J (2005) Kalman filters for audio–video source localization. In: IEEE workshop on applications of signal processing to audio and acoustics

  14. Ekenel, H.K., Stiefelhagen, R. (2005). Local appearance based face recognition using discrete cosine transform. In: 13th European signal processing conference (EUSIPCO), Antalya

  15. Ekenel HK, Stiefelhagen R (2005). A generic face representation approach for local appearance based face verification. In: IEEE CVPR workshop on face recognition grand challenge experiments, San Diego

  16. CHIL, Computers in the human interaction loop. http://www.chil.server.de

  17. AMI, augmented multiparty interaction. http://www.amiproject.org

  18. OpenCV, open computer vision library. http://www.sourceforge.net/projects/opencvlibrary

Download references

Acknowledgments

The work presented here was partly funded by the European Union (EU) under the integrated project CHIL, Computers in the Human Interaction Loop (Grant number IST-506909).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keni Bernardin.

Additional information

An erratum to this article can be found at http://dx.doi.org/10.1007/s00779-008-0216-1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bernardin, K., Ekenel, H.K. & Stiefelhagen, R. Multimodal identity tracking in a smart room. Pers Ubiquit Comput 13, 25–31 (2009). https://doi.org/10.1007/s00779-007-0175-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-007-0175-y

Keywords

Navigation