Context-driven model switching for visual tracking

https://doi.org/10.1016/S0921-8890(02)00259-2Get rights and content

Abstract

A major challenge for real-world object tracking is the dynamic nature of the environmental conditions with respect to illumination, motion, visibility, etc. For such an environment which may experience drastic changes at any time, integration of multiple and complementary cues promises to increase robustness of visual tracking. Nevertheless, one has to expect that false positive tracking will occur. In order to be able to recover from such tracking failure this paper introduces a novel method for automatically choosing the object model which best fits the current context based on information-theoretic concepts. In order to validate the effectiveness of the proposed model switching, it is integrated into a multi-cue face tracking system and experimentally evaluated.

Introduction

Many computer vision algorithms have proven to work well in controlled environments or specific contexts. Most often, however, real-world environmental conditions are anything but controlled or known a priori. In that sense computer vision algorithms such as tracking should be able to deal with dynamic and even dramatic changes of the environment. In the context of robotics outdoor environments pose a serious challenge to most existing computer vision algorithms. But even in the case of indoor environments unpredicted and drastic changes may occur when the robot moves between different rooms, when the illumination varies depending on weather conditions or when lights are being turned off or on. Other dynamic changes with respect to pose, motion, or occlusion are similarly demanding. In order to deal with dynamic changing environments and with false positive tracking the paper proposes a technique for selecting the model most appropriate for the current context. The ultimate goal of the proposed approach is to overcome the limitations of a set of individual models by selecting the best-suited model on-the-fly, using information-theoretic concepts. To evaluate the effectiveness of the method, the approach is integrated into a tracking system. As an example application we consider tracking of a walking human. In that scenario abrupt direction changes of the walking human and sudden illumination changes may cause tracking failure with which the system has to deal.

The remainder of the paper is organized as follows. Section 2 reviews related work, Section 3 introduces the concept of mutual information and how it is used in the proposed model switching scheme. As an application a multi-cue human face tracking system is introduced in Section 4. Experimental results show that the proposed model switching scheme increases the robustness of the system with respect to false positive tracking in dynamically changing environments. Conclusions and future work are discussed in Section 5.

Section snippets

Related work

There seems to be a general consensus that it is important to integrate information coming from different sensors and models in order to increase robustness of today’s computer vision algorithms. Whereas in robotics sensor fusion is a common research topic relatively little research has been done in computer vision. Toyama and Hager [12], for example, propose a layered hierarchy of vision-based tracking algorithms aiming to enable robust and adaptive tracking in real time. When the conditions

A mutual information criterion for switching between detection models

For visual tracking in the real world, one would like to choose the model that best-matches the current context. To this end, the proposed approach maintains a set of candidate detection models each of which models the object of interest within a distinct environmental condition. That is, these models have been learned in a previous stage using, for example, the Maximum Likelihood Estimator and their parameters remain fixed in our selection scheme. Applying each candidate model to the current

Application: robust tracking of human heads

In order to test the context-driven model switching scheme proposed in the previous sections, this section describes its application to people tracking. In particular, we extended an existing multi-cue head tracking system [10] with context-driven switching between skin color models.

One important problem of visual tracking in general and head tracking in particular are sudden changes in the perceived environment. Unexpected and unpredictable situations are likely to disturb even highly

Conclusion and future work

This paper introduced context-driven model switching where the current context is used to choose from a set of previously learned object models. More specifically, the validity of a model in the current context is tested by comparing the resulting output distribution to an a priori expectation. We derive a mutual information criterion for this comparison and show that the object model which maximizes mutual information also best-matches the current context. We also derive a reference

Hannes Kruppa received undergraduate degrees in Computer Science and Medicine from the University of Tuebingen, Germany, and a graduate degree in Computer Science from the Swiss Federal Institute of Technology, ETH Zurich in 1999. His diploma thesis contributing to probabilistic multi-robot localization was carried out at CMU, Pittsburgh, USA, where he stayed as a Visiting Scientist with the group of Sebastian Thrun. Since October 1999, he is a Ph.D. student in the Perceptual Computing and

References (15)

  • A. Blake, M. Isard, Active Contours: The Application of Techniques from Graphics, Vision, Control Theory and Statistics...
  • T. Cover, J. Thomas, Elements of Information Theory, Wiley, New York,...
  • M. Isard, A. Blake, Icondensation: Unifying low-level and high-level tracking in a stochastic framework, in:...
  • M. Isard, A. Blake, A mixed-state condensation tracker with automatic model-switching, in: Proceedings of the Sixth...
  • H. Kruppa, B. Schiele, Using mutual information to combine object models, in: Proceedings of the Eighth International...
  • A. Maki, J.-O. Eklundh, P. Nordlund, A computational model of depth-based attention, in: Proceedings of the...
  • H. Rowley et al.

    Neural network-based face detection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1998)
There are more references available in the full text version of this article.

Cited by (3)

Hannes Kruppa received undergraduate degrees in Computer Science and Medicine from the University of Tuebingen, Germany, and a graduate degree in Computer Science from the Swiss Federal Institute of Technology, ETH Zurich in 1999. His diploma thesis contributing to probabilistic multi-robot localization was carried out at CMU, Pittsburgh, USA, where he stayed as a Visiting Scientist with the group of Sebastian Thrun. Since October 1999, he is a Ph.D. student in the Perceptual Computing and Computer Vision Group at ETH Zurich. His research focuses on statistical modeling for pattern recognition.

Martin Spengler received a graduate degree in Computer Science from the Swiss Federal Institute of Technology, ETH Zurich in 2000. Since October 2000 he is a Ph.D. student with the Perceptual Computing and Computer Vision Group at ETH Zurich. His research focuses on visual tracking and object modeling.

Bernt Schiele is an Assistant Professor of computer science at ETH Zurich and head of the Perceptual Computing and Computer Vision Group since 1999. He received a degree in computer science from the University of Karlsruhe, Germany, as well as from INPG Grenoble, France. In 1994 he was visiting scientist at CMU, Pittsburgh, PA, USA. In 1997 he obtained his Ph.D. from INPG Grenoble, France, in computer vision. Between 1997 and 1999 he was postdoctoral associate at the MIT Media Laboratory, Cambridge, MA, USA. His main research interests are in computer vision, perceptual computing, statistical learning methods, wearable computing, and integration of multi-modal sensor data. He is particularly interested in developing robust perception methods that work under real-world conditions.

View full text