Elsevier

Pattern Recognition Letters

Volume 23, Issue 14, December 2002, Pages 1761-1769
Pattern Recognition Letters

A multiresolution spatiotemporal motion segmentation technique for video sequences based on pyramidal structures

https://doi.org/10.1016/S0167-8655(02)00150-2Get rights and content

Abstract

This paper presents a new spatiotemporal segmentation technique for video sequences. It relies on building adaptively interlinked pyramids over consecutive frames. Pyramids are interlinked to keep a relationship between the regions in the frames. Its performance is good in real-world conditions because it does not depend on image constraints.

Introduction

Segmentation is a tool which has been widely used in image processing and computer vision for a variety of applications. The process consists in dividing a scene into a number of homogeneous regions with respect to one or several of its features. As most image processing techniques, segmentation may be computationally expensive because images often present a large amount of data. Pyramids have been typically used to reduce the computational load required for image processing.

Tanimoto and Pavlidis (1975) introduced pyramids in video processing. They used the term to describe a multiresolution structure where each successive level is half the dimensions of the level below. Later, Burt et al. (1981) developed the linked pyramid, which provides a framework for an iterative process of image segmentation. The difference from previous pyramids is that there are explicit links representing the relationships between cells in adjacent levels. Then, these links are arranged in an iterative way so that each node is linked to an homogeneous region of nodes at every level below. Once converged, the tree structure of links within the pyramid defines the segmentation of the image. Kasif and Rosenfeld (1983) proved that the process always converges. Although the structure presents some limitations, as was pointed out by Bister et al. (1990), Hird and Wilson (1989) demonstrated that hierarchical segmentation on a pyramid yields better results than conventional 2-D techniques. Pietikanen and Rosenfeld (1981) have used these structures for texture measures and Tate and Li (1992) have applied them for stereo matching. Pyramids have been already used for motion estimation and spatiotemporal segmentation (Luthon et al., 1999; Mahzoun et al., 1999; Tan and Martin, 1989; Mutch and Thompson, 1984). However, pyramid-based existing approaches use the structure in a coarse-to-fine way. Thus, they estimate motion by using classic 2-D spatiotemporal segmentation procedures at coarse levels. Then, they propagate their results to higher resolution levels by correcting or predicting propagation errors. Our method uses the structure itself in a bottom-up way to segment consecutive frames at the same time. It does not depend on conventional 2-D motion estimation methods. Further advantages of our approach will be exposed in detail throughout the paper.

This paper presents a new pyramidal structure for image segmentation in video sequences. Linked pyramids have been generated for this purpose by averaging motion parameters to calculate their nodes. Thus, regions resulting from segmentation present coherent motion. The main differences of our approach with respect to classic ones are that: (i) no explicit motion field calculation is required at pixel level; and (ii) segmentation relies on two interlinked pyramids instead of on a single one. After the link structure is iteratively rearranged, each node of the structure built over frame t is linked to an homogeneous region of cells at frame t, but also to an homogeneous region of cells at frame t−1. When this process is finished, the region at time t is the same one as the region at time t−1. Thus, the scene is segmented not only in space but also in time.

The paper is organized as follows: Section 2 presents the new structure and the link rearranging procedure, Section 3 introduces experiments and results and, finally, conclusions and future work are presented in Section 4.

Section snippets

Spatiotemporal hierarchical segmentation of consecutive pyramids

This section describes the proposed spatiotemporal segmentation procedure. First, a brief introduction on the basic pyramidal structure is presented. The next subsection presents the new stabilization algorithm. The main differences of this algorithm to previous spatial stabilization techniques are highlighted. The last subsection portrays the most relevant features of the proposed algorithm.

Experiments and results

The proposed algorithm provides motion estimation and segmentation as a result of the combined stabilization technique. It must be noted that these results cannot be isolated, because both are extracted for the link structure and used to generate it.

To test the validity of the proposed algorithm, it is going to be used to track a moving person in a corridor using a azimuthal video camera. The camera transmits greyscale images yielding 256×256 pixels to a remote PC, which performs the

Conclusions

We have introduced a new method to achieve spatiotemporal segmentation of a video sequence by using a hierarchical structure. The method consists in generating 4-to-1 linked pyramids over the frames of the sequence and linking each two pyramids built over consecutive frames of the sequence in a combined way. Then, links are rearranged in an iterative bottom-up way to guarantee that each node is associated to a region of pixels yielding an homogeneous grey level in both pyramids.

The main

Acknowledgements

This work has been partially supported by the Spanish Comisión Interministerial de Ciencia y Tecnologı́a (CICYT), project number TIC098-0562.

References (16)

There are more references available in the full text version of this article.

Cited by (4)

  • A review of log-polar imaging for visual perception in robotics

    2010, Robotics and Autonomous Systems
    Citation Excerpt :

    The RWT has been shown to be suitable in road following and depth recovery tasks. Multiresolution, pyramid-based foveal-like mechanisms, such as the Cartesian exponential topology (CET), have been used in active vision [48], video transmission [49], and motion and image segmentation [50,51]. The Cartesian Foveal Geometry (CFG) [52] (Fig. 6(b)) is similar to the CET, but the receptive fields are of constant size regardless of their eccentricity and no actual multiresolution is used.

  • Using resolution pyramids for watershed image segmentation

    2007, Image and Vision Computing
    Citation Excerpt :

    G1 is interpreted as a 3D landscape, where the grey-level g of the pixel in position (x, y), is used as the third coordinate in the landscape. Pyramids are well known multi-resolution representation systems; they provide from coarse to fine representations of a discrete image [7], and have been employed for a number of applications, such as line-drawing analysis or object contour extraction [8–13] and segmentation [14–18]. The general regular pyramid construction strategy is based on the use of a uniform subdivision rule that associates to fixed size regions in the image at a given resolution, single pixels in the image at immediately lower resolution.

  • Detecting foreground components in grey level images for shift invariant and topology preserving pyramids

    2004, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View full text