Elsevier

Pattern Recognition

Volume 34, Issue 3, March 2001, Pages 661-670
Pattern Recognition

Motion-based segmentation and region tracking in image sequences

https://doi.org/10.1016/S0031-3203(00)00014-5Get rights and content

Abstract

This paper presents an algorithm for segmenting and tracking moving objects in a scene. Temporal information provided by a region tracking strategy is integrated for improving frame-to-frame motion segmentation. The method has been applied to a traffic monitoring system and it provides facilities such as estimating trajectories of vehicles, detecting stopped vehicles, counting vehicles and estimating the mean velocity of the traffic.

Introduction

The method presented in this paper is part of a traffic monitoring system based on motion segmentation and tracking techniques. This system provides important facilities such as surveillance of vehicle trajectories, vehicle shape segmentation and generation of information about traffic state. These facilities are very useful for detecting traffic problems, and in addition may help to improve road design.

Tracking is a common procedure in application domains such as surveillance and visual servoing. Its aim is to match entities that appear in different frames of a sequence. Several methods have been proposed on this topic, but we can find four main trends that group the majority of these works:

1. 3-D-based methods. They consist of precise geometrical representations of known objects. By using the knowledge about the geometry of the camera and the scene, a three-dimensional model is projected onto the image. This type of methods presents a tremendous computational load that does not seem to be justified by the requirements of a traffic monitoring system. However, they have been applied for tracking individual vehicles by using expensive hardware [1], [2], [3], [4].

2. Feature-based methods track individual tokens such as points [5], [6], lines [7], [8] or curves [9], usually based on matching schemes. These methods present two main disadvantages [10], [11]: they do not provide explicit grouping of tokens moving with coherent motion, and are quite sensitive to occlusion.

3. Deformable model-based methods [12], [13], [14] fit models to the contours of the moving objects in the scene. Thereby, models are tracked from one frame to the next. They are very suitable for structured scenes, but exhibit initialization problems [6], [10]. When moving objects are partially occluded in the scene, as usually happens in traffic, initialization fails, since models cannot be adapted to real objects.

4. Region-based methods define groups of connected pixels that are detected as belonging to a single object that is moving with a different motion from its neighbouring regions [15], [16], [17]. Region tracking is less sensitive to occlusion due to the extensive information that regions supply. Characteristics such as size, shape or intensity can be directly obtained from them. Moreover, regions are very suitable for scenes with a stationary background, since once motion has been estimated for regions, this information allows us to separate moving regions from stationary ones.

With the exception of a few works, the benefits of tracking moving objects to improve the segmentation have been ignored. Irani et al. [18] use temporal integration of motion segmentation results to improve performance. This method does not use shape tracking. Hence, information about scene events is not readily available. The method of Meyer and Bouthemy [19] tracks polygonal representations of moving objects. By comparing a predicted image region with a measured region, occlusions are detected. Smith and Brady [20] use a radial map for storing vehicle shapes. This map is combined with the segmentation of the current frame to produce the final segmentation. This allows the method to detect occlusions and to keep the approximate shape of the vehicles in these situations. In the work by Mae et al. [21], discontinuities in the flow field are matched to intensity discontinuities (edges). These correspondences represent moving edges that are accumulated and used in two ways: as predictions of the edge positions in the current frame, and as substitutes of lost edges for achieving united contours.

Our approach is based on a region tracking algorithm which can take advantage of common characteristics in traffic monitoring scenes. In traffic scenes, the background is stationary, and vehicles sometimes appear partially occluded behind other vehicles. Thereby, a region-based method is more suitable, since it can deal with these situations without posing initialization problems. Moreover, regions are suitable for tracking objects, since they determine quite accurately their shape and location. Regions present advantages with respect to models and simple feature-based methods. It is not necessary to group simple features with poor information to form objects, as it happens with lines or points, and they do not present the initialization problem that restricts the use of model-based methods.

The work presented here has been motivated by the idea that not only segmentation should provide information for tracking, but tracking can also supply information for improving the segmentation. Tracking matches entities that appear in different frames. Therefore, it can be the mean for a feedback segmentation process with a temporal knowledge longer than a few frames. Knowing the evolution of the shape of an object for n frames, it is possible to predict its shape in a frame n+1. Thereby, we can correct a frame segmentation in which entities appear, disappear or change their shape suddenly. This is the main difference with other methods which perform a definitive segmentation before tracking.

The rest of the paper has the following structure: the next section describes the proposed method; Section 3 explains some of the facilities that the method provides for traffic monitoring; Section 4 shows results over real traffic sequences; and Section 5 draws some conclusions.

Section snippets

Region segmentation and tracking

The whole algorithm is summarized in Fig. 1. For every frame of the sequence, our approach carries out a frame-to-frame motion segmentation, followed by a matching process. This process matches regions segmented in previous frames to regions segmented in the current frame. Matching allows the system to know the evolution of the regions, evolution which is stored in a master list, which contains the temporal history of all the segmented regions. Regions segmented in the current frame that are

Traffic monitoring facilities

As mentioned in the introduction, the motivation of this work was to develop a traffic monitoring system. For this system the following basic facilities have to be provided: counting vehicles (traffic mean velocity), trajectory surveillance and detection of stopped vehicles.

Counting vehicles: When a region is declared as a permanent region, a counter (CT), that counts the number of observed vehicles, is increased. Besides, when the motion direction indicates that the vehicle is moving away from

Results

This method has been tested with several traffic sequences. In Fig. 3, Fig. 4 we present two of these sequences to show some of the facilities of the system. The first column of these figures presents some sample frames of the sequences. The second column contains the frame-to-frame segmentation of each image, and the third column is the final segmentation applying the whole algorithm proposed, that is, the result of integrating the segmentation information through several frames.

Fig. 3 shows

Conclusions

In this work we have presented a new approach for segmenting and tracking moving objects. Our method integrates segmentations provided by a frame-to-frame motion segmentation, and accumulates segmentation results to obtain an improved segmentation. As opposed to usual methods, our method not only uses segmentation to achieve object tracking, but the tracking is used for improving the segmentation.

The experiments carried out have shown that our approach stabilizes the shape of the vehicles along

About the Author—JORGE BADENAS received his degree in Computer Science at the Universidad Politécnica de Valencia in 1991 and his M.sc. degree on Multimedia Systems and Technologies at the University of Surrey, United Kingdom. He works as Reader Professor Titular de Escuela Universitaria in the Department of Computer Science at Universitat Jaume I in Castellón, Spain. His research interests are segmentation and motion analysis.

References (25)

  • Z. Zhang et al.

    Three-dimensional motion computation and object segmentation in a long sequence of stereo frames

    Int. J. Comput. Vision

    (1992)
  • A. Mitiche et al.

    Computation and analysis of image motion: a synopsis of current problems and methods

    Int. J. Comput. Vision

    (1996)
  • Cited by (45)

    • Visual surveillance by dynamic visual attention method

      2006, Pattern Recognition
      Citation Excerpt :

      Now, surveillance exclusively dedicated to persons is also a growing field of interest. Broadly speaking, there are different approaches ranging from active vision algorithms [13] to model-based tracking methods [14], from active contour processes[15] to different features integration (numeric or semantic) [16]. Lastly, let us highlight the more recent works in vehicle and person surveillance integration.

    • Robot Autonomous Navigation Based on Program Learning in Dynamic Environment

      2019, Proceedings of 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference, IMCEC 2019
    View all citing articles on Scopus

    About the Author—JORGE BADENAS received his degree in Computer Science at the Universidad Politécnica de Valencia in 1991 and his M.sc. degree on Multimedia Systems and Technologies at the University of Surrey, United Kingdom. He works as Reader Professor Titular de Escuela Universitaria in the Department of Computer Science at Universitat Jaume I in Castellón, Spain. His research interests are segmentation and motion analysis.

    About the Author—JOSÉ MIGUEL SANCHIZ received a degree in Telecommunication Engineering from Universitat Politécnica de Catalunya (Barcelona, Spain) in 1985, a degree in Physics with Best Curriculum Award from Universidad Nacional de Educación a Distancia (Madrid, Spain) in 1993, and the Ph.D. in Computer Engineering from Universitat Jaume I (Castellón, Spain) in 1997. Dr. Sanchiz works as a Reader Professor Titular de Universidad at University Jaume I, although he has been as Associate Researcher at the University fo Edinburgh during 1999. In recent years he has been working in feature point detection, tracking and motion estimation, fruit and vegetable sorting by machine vision, and sensor planning for environment recovering by range images.

    About the Author—FILIBERTO PLA received his degree in Physics at the University of Valencia in 1989 and his Ph.D. degree in Physics at the same University in 1993. He was research fellow in 1993 at Silsoe Research Institute, UK, and at the University of Surrey, UK in 1996. He is currently Assistant Professor in the Department of Computer Science at University Jaume I in Castellón, Spain. His research interests are colour image processing, motion estimation, stereo and pattern recognition, leading several projects in these areas. Dr. F. Pla is member of the AERFAI (Spanish Association for Pattern Recognition and Image Analysis) and IAPR (International Association for Pattern Recognition).

    This work was partially supported by the projects ESPRIT PROJECT EP-21007, TIC98-0677-C022, Generalitat Valenciana GV97-TI-05-26 and GV97-TI-05-27.

    View full text