Abstract
The automatic detection, tracking, and identification of multiple people in intelligent environments are important building blocks on which smart interaction systems can be designed. Those could be, e.g., gesture recognizers, head pose estimators or far-field speech recognizers and dialog systems. In this paper, we present a system which is capable of tracking multiple people in a smart room environment while inferring their identities in a completely automatic and unobtrusive way. It relies on a set of fixed and active cameras to track the users and get close-ups of their faces for identification, and on several microphone arrays to determine active speakers and steer the attention of the system. Information coming asynchronously from several sources, such as position updates from audio or visual trackers and identification events from identification modules, is fused at higher level to gradually refine the room’s situation model. The system has been trained on a small set of users and showed good performance at acquiring and keeping their identities in a smart room environment.
Similar content being viewed by others
References
Khalaf R, Intille S (2001) Improving multiple people tracking using temporal consistency. MIT Department of Architecture House_n Project Technical Report Massachusetts Institute of Technology, Cambridge
Lienhart R, Maydt J (2002). An extended set of Haar-like features for rapid object detection. IEEE ICIP 1:900–903
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Conference on computer vision and pattern recognition, Hawaii, December 9–14
McCowan I, Gatica-Perez D, Bengio S, Lathoud G, Barnard M, Zhang D (2005) Automatic analysis of multimodal group actions in meetings. IEEE Trans Pattern Anal Mach Intell 27(3):305–317
Stiefelhagen R (2002) Tracking focus of attention in meetings. In: IEEE International conference on multimodal interfaces—ICMI, Pittsburgh, pp 273–280
Voit M, Nickel K, Stiefelhagen R (2005) Multi-view head pose estimation using neural networks. In: 2nd Workshop on face processing in video (FPiV’05), in association with IEEE 2nd Canadian conference on computer and robot vision, Victoria
Choudhury T, Clarkson B, Jebara T, Pentland A, (1999). Multimodal person recognition using unconstrained audio and video. In: Second Conference on audio- and video-based biometric person authentication’99 (AVBPA’99), Washington DC, pp 176–181
Yang J, Zhu X, Gross R, Kominek J, Pan Y, Waibe A (1999). Multimodal people ID for a multimedia meeting browser. In: Proceedings of the 7th ACM international conference on multimedia ‘99, Orlando
Tsuruoka S, Yamaguchi T, Kato K, Yoshikawa T, Shinogi T (2001). A camera control based fuzzy behaviour recognition of lecturer for distance lecture. In: Proceedings of the 10th IEEE international conference on fuzzy systems, Melbourne
Peixoto P, Batista J, Araujo H (1998). A surveillance system combining peripheral and foveated motion tracking. In: Proceedings of the 14th international conference on pattern recognition, vol 1, pp 574–577
Hampapur A, Pankanti S, Senior A, Tian Y, Brown L, Bolle R (2003). Face cataloger: multi-scale imaging for relating identity to location. In: IEEE conference on advanced video and signal based surveillance (AVSS 2003), Miami, USA
Stillman S, Tanawongsuwan R, Essa I (1998). A system for tracking and recognizing multiple people with multiple cameras. TR GIT-GVU-98-25, Georgia Institute of Technology, Graphics, Visualization, and Usability Center, Georgia
Gehrig T, Nickel K, Ekenel HK, Klee U, McDonough J (2005) Kalman filters for audio–video source localization. In: IEEE workshop on applications of signal processing to audio and acoustics
Ekenel, H.K., Stiefelhagen, R. (2005). Local appearance based face recognition using discrete cosine transform. In: 13th European signal processing conference (EUSIPCO), Antalya
Ekenel HK, Stiefelhagen R (2005). A generic face representation approach for local appearance based face verification. In: IEEE CVPR workshop on face recognition grand challenge experiments, San Diego
CHIL, Computers in the human interaction loop. http://www.chil.server.de
AMI, augmented multiparty interaction. http://www.amiproject.org
OpenCV, open computer vision library. http://www.sourceforge.net/projects/opencvlibrary
Acknowledgments
The work presented here was partly funded by the European Union (EU) under the integrated project CHIL, Computers in the Human Interaction Loop (Grant number IST-506909).
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article can be found at http://dx.doi.org/10.1007/s00779-008-0216-1
Rights and permissions
About this article
Cite this article
Bernardin, K., Ekenel, H.K. & Stiefelhagen, R. Multimodal identity tracking in a smart room. Pers Ubiquit Comput 13, 25–31 (2009). https://doi.org/10.1007/s00779-007-0175-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-007-0175-y