Elsevier

Pattern Recognition

Volume 40, Issue 7, July 2007, Pages 1929-1945
Pattern Recognition

Shadow resistant tracking using inertia constraints

https://doi.org/10.1016/j.patcog.2005.09.014Get rights and content

Abstract

In this paper, we present a new method for tracking objects with shadows. Traditional motion-based tracking schemes cannot usually distinguish the shadow from the object itself, and this results in a falsely captured object shape. If we want to utilize the object's shape information for a pattern recognition task, this poses a severe difficulty. In this paper we present a color processing scheme to project the image into an illumination invariant space such that the shadow's effect is greatly attenuated. The optical flow in this projected image together with the original image is used as a reference for object tracking so that we can extract the real object shape in the tracking process. We present a modified snake model for general video object tracking. Two new external forces are introduced into the snake equation based on the predictive contour and a new chordal string shape descriptor such that the active contour is attracted to a shape similar to the one in the previous video frame. The proposed method can deal with the problem of an object's ceasing movement temporarily, and can also avoid the problem of the snake tracking into the object interior. Global affine motion estimation is applied to mitigate the effect of camera motion, and hence the method can be applied in a general video environment. Experimental results show that the proposed method can track the real object even if there is strong shadow influence.

Introduction

Shadows present a confounding factor for correct object tracking. Traditional motion detection schemes cannot distinguish the moving object and the shadows moving with it. Therefore, object tracking results based on traditional schemes usually produce contours based on a combination of the object and its shadow. This kind of result will pose severe difficulties if the contour is further passed to an analyzer for object recognition. Eliminating the shadow and tracking the real contour of an object is a challenging problem. Different schemes have been presented to try to attenuate the shadow's influence in applications such as object tracking and still image segmentation. In Ref. [1] the shadow detection and elimination problem was studied in the context of road surveillance. In the specific application, lighting conditions were restricted to sunlight, the internal and external parameters of the video camera were fixed, and the target object was restricted to a walking human being. A method was presented for locating the real position of a walking human by extracting the core lines of the human and the core lines of the shadows based on a motion detection map. In Ref. [2] a geometrical scheme based on stereo vision was presented for shadow elimination in surveillance video tracking. The scheme is based on image subtraction, with the image captured by one camera first projected onto the road plane and then further projected onto the image plane of the second camera. The road maps of two images should map perfectly while other parts such as walking humans will not map well. The difference of the two images, thresholded by a given value, yields a mask eliminating everything on the road plane including the moving shadows. In Ref. [3], simple illumination invariant features were applied to obtain an image, which apprehends differences between surface materials. Since cast shadows only change the illumination of backgrounds, the illumination invariant features will attenuate shadow effects. The method is applied in the context of still image segmentation. In Ref. [4], a statistics method is presented for pixel classification. The features used include the luminance and normalized chrominance vector. The color change of a pixel is described by multiplying each color channel by a constant, via a diagonal model of color change. Pixels are classified into three classes: background, foreground, and shadow, based on maximum a posteriori classification. Spatial information is also applied to improve the dense region classification result. In Ref. [5], the shadow detection problem is studied based on a model similar to the Phong model. Heuristic methods are presented to classify the shadow and foreground object. A recent shadow detection scheme designed for outdoor scenes is presented in Ref. [6].

In this paper, we present a different method, based on a physics-based illumination invariant color space, and on an inertia-enhanced snake model, for reliable object tracking in a general video environment. If lighting is approximately Planckian, then in Wien's approximation the resulting simple exponential form of the illumination spectrum leads to the conclusion that as temperature T changes, characterizing the illumination color, a log–log plot of two-dimensional {log(R/G),log(B/G)} values for any single surface forms a straight line provided camera sensors are fairly narrow-band [7], [8], [9]. Thus, lighting change reduces to a linear transformation along an almost straight line, even for real data with only approximately Planckian lighting. For a target with many paint patches, mean-subtracted log–log plots all cluster around a single line through the origin that characterizes lighting change. The invariant image is thus the gray-scale image that results from projecting log–log pixel values onto the direction orthogonal to lighting change, within and outside the umbra; the projection greatly attenuates shadowing.

Based on this color projection, we further present an inertia-enhanced snake model for tracking objects with shadows. We devise two inertia terms. The first term is based on the predictive contour, and the second is based on a new chordal shape descriptor. These two additional terms force the active contour to converge to a shape similar to the one in the previous video frame. The inertia energy term makes the snake ignore distracting elements, and thus no precise initial contour is needed. Moreover, if the object stops moving temporarily, the snake will evolve according to the inertia term in the predictive contour and chordal constraint term and converge to a similar shape to the previous frame and also correspond to the motion prediction result. We adopt an affine motion model for global motion estimation and camera motion compensation with the result that our scheme can work in a general video environment. Comparing to other standard contour tracking schemes such as [10], [11], [12], the proposed scheme does not need a training process. The complexity of the algorithm is comparable to the standard snake and is thus suited for real-time applications.

The organization of the paper is as follows. We first study shadow-invariant image space in Section 2. We show that under Planckian lighting, the log–log plot of ratios (log(R/G),log(B/G)) forms a straight line for each material, for narrow-band sensor cameras. Based on this observation, we set out a camera calibration scheme for shadow-invariant image generation in Section 2.1. In Section 3, we present the tracking scheme based on an inertia snake model and shadow-invariant image for shadow resistant video tracking. The modified snake equation is studied in Section 3.1. Contour prediction based on iterative conditional modes (ICM) is presented in Section 3.2. In Section 3.3, we show how to generate external forces based on global motion compensated motion detection and gradient vector flow. The numerical scheme for the proposed snake equation is presented in Section 3.5, and the tracking system is presented in Section 4. Experiments, results, and discussions are presented in Section 5.

Section snippets

Shadow-invariant image space

Shadows are usually classified as self-shadows and cast shadows. Self-shadows result from part of the object blocking some light from another part of the same object. Self-shadows usually pose a minor problem for tracking tasks. However, cast shadows, caused by one object shadowing another object in the scene, can be caused by the background objects shadowing the tracking target or the target's own shadow on the background object. We are most interested in cast shadows, especially the shadow

An inertia snake model

The traditional 2D snake is a deformable curve X(s)=[x(s),y(s)], where s is a parameter in the range [0,1]. The contour is determined by an energy minimization problem. Contour X is that which minimizes the system energy E, defined asE(X)=01α2X(s)2+β22X(s)2+P(X(s))ds,where α and β are parameters to control the internal tension (stretching) and stiffness (bending) of the contour, respectively; X(s)=(dx(s)/ds,dy(s)/ds) and 2X(s)=(dx2(s)/ds2, dy2(s)/ds2); (x,y)=x2+y2; and P(X) is an

Shadow resistant tracking system

In this section we present the overall system for our tracking scheme. The system diagram is shown in Fig. 4.

The system is based on an incremental scheme. Two consecutive frames are used in the global motion estimation, motion detection, and contour prediction steps. The scheme can also be easily extended to the three-frame or multiframe model. The shadow resistant tracking algorithm is as follows:

Algorithm 1

1.Fetch frame i, denoted by F(i); previous frame is F(i-1).
2.Calculate the affine transformation

Experimental results

First, we applied the proposed tracking method to video sequences with little shadow interference. Both moving and fixed camera situations are tested. In these experiments, only the source color video is used to estimate object motions, and not the shadow-invariant version. Fig. 5, Fig. 6, Fig. 7, Fig. 8 show the tracking results based on the proposed scheme.

For shadow resistant object tracking, we used a consumer camcorder (Canon ES60) in our experiments—the method is robust against gamma

Conclusion

We present an efficient algorithm for tracking objects that is resistant to shadows. The algorithm eliminates the distracting influence from shadows and tracks the shape of the actual object. Shadow removal is based on a preliminary, simple, camera calibration. Shadow resistant tracking can be very useful for higher level vision processing such as gesture or behavior recognition. Inertia terms we introduce into the variational problem tend to preserve the object boundary shape between frames,

About the AuthorHAO JIANG is currently a Ph.D. candidate in the School of Computing Science at Simon Fraser University, Vancouver, Canada. His research interests are multimedia, computer vision, signal processing and communications, graphics and AI. His thesis focuses on novel approaches to energy minimization methods in computer vision and pattern recognition.

References (21)

  • K. Onoguchi, Shadow elimination method for moving object detection, in: Proceedings of the 14th International...
  • Y. Sonoda, T. Ogata, Separation of moving objects and their shadows, and application to tracking of loci in the...
  • E. Salvador, A. Cavallaro, T. Ebrahimi, Shadow identification and classification using invariant color models,...
  • I. Mikic, P. Cosman, G. Kogut, M. Trivedi, Moving shadow and object detection in traffic scenes, International...
  • J. Stauder et al.

    Detection of moving cast shadows for object segmentation

    IEEE Trans. Multimedia

    (1999)
  • S. Nadimi et al.

    Physical models for moving shadow and object detection in video

    IEEE Pattern Anal. Mach. Intell.

    (2004)
  • G.D. Finlayson et al.

    Color constancy at a pixel

    J. Opt. Soc. Amer. A

    (2001)
  • G.D. Finlayson, M.S. Drew, 4-sensor camera calibration for image representation invariant to shading, shadows,...
  • G.D. Finlayson, S.D. Hordley, M.S. Drew, Removing shadows from images, in: ECCV 2002: European Conference on Computer...
  • A. Blake et al.

    Affine-invariant contour tracking with automatic control of spatial-temporal scale

There are more references available in the full text version of this article.

Cited by (11)

View all citing articles on Scopus

About the AuthorHAO JIANG is currently a Ph.D. candidate in the School of Computing Science at Simon Fraser University, Vancouver, Canada. His research interests are multimedia, computer vision, signal processing and communications, graphics and AI. His thesis focuses on novel approaches to energy minimization methods in computer vision and pattern recognition.

About the AuthorMARK S. DREW is an Associate Professor in the School of Computing Science at Simon Fraser University, Vancouver, Canada. His background education is in Engineering Science, Mathematics, and Physics. His interests lie in the fields of multimedia, computer vision, image processing, color, photorealistic computer graphics, and visualization. He has published over 100 refereed papers in journals and conference proceedings. Dr. Drew is the holder of a US Patent in digital color processing.

View full text