Context-driven model switching for visual tracking
Introduction
Many computer vision algorithms have proven to work well in controlled environments or specific contexts. Most often, however, real-world environmental conditions are anything but controlled or known a priori. In that sense computer vision algorithms such as tracking should be able to deal with dynamic and even dramatic changes of the environment. In the context of robotics outdoor environments pose a serious challenge to most existing computer vision algorithms. But even in the case of indoor environments unpredicted and drastic changes may occur when the robot moves between different rooms, when the illumination varies depending on weather conditions or when lights are being turned off or on. Other dynamic changes with respect to pose, motion, or occlusion are similarly demanding. In order to deal with dynamic changing environments and with false positive tracking the paper proposes a technique for selecting the model most appropriate for the current context. The ultimate goal of the proposed approach is to overcome the limitations of a set of individual models by selecting the best-suited model on-the-fly, using information-theoretic concepts. To evaluate the effectiveness of the method, the approach is integrated into a tracking system. As an example application we consider tracking of a walking human. In that scenario abrupt direction changes of the walking human and sudden illumination changes may cause tracking failure with which the system has to deal.
The remainder of the paper is organized as follows. Section 2 reviews related work, Section 3 introduces the concept of mutual information and how it is used in the proposed model switching scheme. As an application a multi-cue human face tracking system is introduced in Section 4. Experimental results show that the proposed model switching scheme increases the robustness of the system with respect to false positive tracking in dynamically changing environments. Conclusions and future work are discussed in Section 5.
Section snippets
Related work
There seems to be a general consensus that it is important to integrate information coming from different sensors and models in order to increase robustness of today’s computer vision algorithms. Whereas in robotics sensor fusion is a common research topic relatively little research has been done in computer vision. Toyama and Hager [12], for example, propose a layered hierarchy of vision-based tracking algorithms aiming to enable robust and adaptive tracking in real time. When the conditions
A mutual information criterion for switching between detection models
For visual tracking in the real world, one would like to choose the model that best-matches the current context. To this end, the proposed approach maintains a set of candidate detection models each of which models the object of interest within a distinct environmental condition. That is, these models have been learned in a previous stage using, for example, the Maximum Likelihood Estimator and their parameters remain fixed in our selection scheme. Applying each candidate model to the current
Application: robust tracking of human heads
In order to test the context-driven model switching scheme proposed in the previous sections, this section describes its application to people tracking. In particular, we extended an existing multi-cue head tracking system [10] with context-driven switching between skin color models.
One important problem of visual tracking in general and head tracking in particular are sudden changes in the perceived environment. Unexpected and unpredictable situations are likely to disturb even highly
Conclusion and future work
This paper introduced context-driven model switching where the current context is used to choose from a set of previously learned object models. More specifically, the validity of a model in the current context is tested by comparing the resulting output distribution to an a priori expectation. We derive a mutual information criterion for this comparison and show that the object model which maximizes mutual information also best-matches the current context. We also derive a reference
Hannes Kruppa received undergraduate degrees in Computer Science and Medicine from the University of Tuebingen, Germany, and a graduate degree in Computer Science from the Swiss Federal Institute of Technology, ETH Zurich in 1999. His diploma thesis contributing to probabilistic multi-robot localization was carried out at CMU, Pittsburgh, USA, where he stayed as a Visiting Scientist with the group of Sebastian Thrun. Since October 1999, he is a Ph.D. student in the Perceptual Computing and
References (15)
- A. Blake, M. Isard, Active Contours: The Application of Techniques from Graphics, Vision, Control Theory and Statistics...
- T. Cover, J. Thomas, Elements of Information Theory, Wiley, New York,...
- M. Isard, A. Blake, Icondensation: Unifying low-level and high-level tracking in a stochastic framework, in:...
- M. Isard, A. Blake, A mixed-state condensation tracker with automatic model-switching, in: Proceedings of the Sixth...
- H. Kruppa, B. Schiele, Using mutual information to combine object models, in: Proceedings of the Eighth International...
- A. Maki, J.-O. Eklundh, P. Nordlund, A computational model of depth-based attention, in: Proceedings of the...
- et al.
Neural network-based face detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1998)
Cited by (3)
Intelligent robotic systems - SIRS'2001
2002, Robotics and Autonomous SystemsAutomatic resource allocation in a distributed camera network
2010, Machine Vision and ApplicationsRigid medical image registration and its association with mutual information
2003, International Journal of Pattern Recognition and Artificial Intelligence
Hannes Kruppa received undergraduate degrees in Computer Science and Medicine from the University of Tuebingen, Germany, and a graduate degree in Computer Science from the Swiss Federal Institute of Technology, ETH Zurich in 1999. His diploma thesis contributing to probabilistic multi-robot localization was carried out at CMU, Pittsburgh, USA, where he stayed as a Visiting Scientist with the group of Sebastian Thrun. Since October 1999, he is a Ph.D. student in the Perceptual Computing and Computer Vision Group at ETH Zurich. His research focuses on statistical modeling for pattern recognition.
Martin Spengler received a graduate degree in Computer Science from the Swiss Federal Institute of Technology, ETH Zurich in 2000. Since October 2000 he is a Ph.D. student with the Perceptual Computing and Computer Vision Group at ETH Zurich. His research focuses on visual tracking and object modeling.
Bernt Schiele is an Assistant Professor of computer science at ETH Zurich and head of the Perceptual Computing and Computer Vision Group since 1999. He received a degree in computer science from the University of Karlsruhe, Germany, as well as from INPG Grenoble, France. In 1994 he was visiting scientist at CMU, Pittsburgh, PA, USA. In 1997 he obtained his Ph.D. from INPG Grenoble, France, in computer vision. Between 1997 and 1999 he was postdoctoral associate at the MIT Media Laboratory, Cambridge, MA, USA. His main research interests are in computer vision, perceptual computing, statistical learning methods, wearable computing, and integration of multi-modal sensor data. He is particularly interested in developing robust perception methods that work under real-world conditions.