Abstract
This paper presents the Athens Information Technology system for 3D person tracking and the obtained results in the CLEAR 2007 evaluations. The system utilizes audiovisual information from multiple acoustic and video sensors. The proposed system comprises a video and an audio subsystem whose results are suitably combined to track the last active speaker. The video subsystem combines in 3D a number of 2D face localization systems, aiming at tracking all people present in a room. The audio subsystem uses an information theoretic metric upon an ensemble of microphones to estimate the active speaker.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Waibe1, A., Steusloff, H., Stiefelhagen, R., et al.: CHIL: Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Lisbon, Portugal (April 2004)
Pnevmatikakis, A., Talantzis, F., Soldatos, J., Polymenakos, L.: Robust Multimodal Audio-Visual Processing for Advanced Context Awareness in Smart Spaces. In: Artificial Intelligence Applications and Innovations, Peania, Greece (June 2006)
Zhang, Z.: A Flexible New Technique for Camera Calibration, Technical Report MSR-TR-98-71, Microsoft Research (August 2002)
Stergiou, A., Karame, G., Pnevmatikakis, A., Polymenakos, L.: The AIT 2D face detection and tracking system for CLEAR 2007. In: CLEAR 2007. LNCS, vol. 4625, Springer, Heidelberg (2008)
Talantzis, F., Constantinides, A.G., Polymenakos, L.: Estimation of Direction of Arrival Using Information Theory. IEEE Signal Processing 12(8), 561–564 (2005)
Talantzis, F., Constantinides, A.G., Polymenakos, L.: Real-Time Audio Source Localization Using Information Theory. In: Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI 2006) (May 2006)
Brandstein, M.S., Adcock, J.E., Silverman, H.: A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays. IEEE Trans. on Acoust. Speech and Sig. Proc. 5, 45–50 (1997)
Gordon, N.J., Salmond, D.J., Smith, A.F.M.: Novel approach to nonlinear/nongaussian bayesian state estimation. IEE Proceedings-F (Radar and Signal Processing) 140(2), 107–113 (1993)
Vermaak, J., Blake, A.: Nonlinear filtering for speaker tracking in noisy and reverberant environments. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Salt Lake City, USA, May 2001, vol. 5, pp. 3021–3024 (2001)
Knapp, C.H., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust., Speech, Signal Process. ASSP-24(4), 320–327 (1976)
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 6, 1–3 (1999)
Lehmann, E.A., Johansson, A.M.: Particle Filter with Integrated Voice Activity Detection for Acoustic Source Tracking. EURASIP Journal on Advances in Signal Processing 2007 Article ID 50870 (2007)
Bolic, M., Djuric, P.M., Hong, S.: New Resampling Algorithms for Particle Filters. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, vol. 2, pp. 589–592 (2003)
Pnevmatikakis, A., Polymenakos, L.: 2D Person Tracking Using Kalman Filtering and Adaptive Background Learning in a Feedback Loop. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, Springer, Heidelberg (2007)
Mostefa, D., et al.: CLEAR Evaluation Plan, document CHIL-CLEAR-V1.1-2006-02-21 (February 2006)
Blackman, S.: Multiple-Target Tracking with Radar Applications, ch. 14. Artech House, Dedham (1986)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Katsarakis, N., Talantzis, F., Pnevmatikakis, A., Polymenakos, L. (2008). The AIT 3D Audio / Visual Person Tracker for CLEAR 2007. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds) Multimodal Technologies for Perception of Humans. RT CLEAR 2007 2007. Lecture Notes in Computer Science, vol 4625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68585-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-68585-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68584-5
Online ISBN: 978-3-540-68585-2
eBook Packages: Computer ScienceComputer Science (R0)