A multiresolution spatiotemporal motion segmentation technique for video sequences based on pyramidal structures
Introduction
Segmentation is a tool which has been widely used in image processing and computer vision for a variety of applications. The process consists in dividing a scene into a number of homogeneous regions with respect to one or several of its features. As most image processing techniques, segmentation may be computationally expensive because images often present a large amount of data. Pyramids have been typically used to reduce the computational load required for image processing.
Tanimoto and Pavlidis (1975) introduced pyramids in video processing. They used the term to describe a multiresolution structure where each successive level is half the dimensions of the level below. Later, Burt et al. (1981) developed the linked pyramid, which provides a framework for an iterative process of image segmentation. The difference from previous pyramids is that there are explicit links representing the relationships between cells in adjacent levels. Then, these links are arranged in an iterative way so that each node is linked to an homogeneous region of nodes at every level below. Once converged, the tree structure of links within the pyramid defines the segmentation of the image. Kasif and Rosenfeld (1983) proved that the process always converges. Although the structure presents some limitations, as was pointed out by Bister et al. (1990), Hird and Wilson (1989) demonstrated that hierarchical segmentation on a pyramid yields better results than conventional 2-D techniques. Pietikanen and Rosenfeld (1981) have used these structures for texture measures and Tate and Li (1992) have applied them for stereo matching. Pyramids have been already used for motion estimation and spatiotemporal segmentation (Luthon et al., 1999; Mahzoun et al., 1999; Tan and Martin, 1989; Mutch and Thompson, 1984). However, pyramid-based existing approaches use the structure in a coarse-to-fine way. Thus, they estimate motion by using classic 2-D spatiotemporal segmentation procedures at coarse levels. Then, they propagate their results to higher resolution levels by correcting or predicting propagation errors. Our method uses the structure itself in a bottom-up way to segment consecutive frames at the same time. It does not depend on conventional 2-D motion estimation methods. Further advantages of our approach will be exposed in detail throughout the paper.
This paper presents a new pyramidal structure for image segmentation in video sequences. Linked pyramids have been generated for this purpose by averaging motion parameters to calculate their nodes. Thus, regions resulting from segmentation present coherent motion. The main differences of our approach with respect to classic ones are that: (i) no explicit motion field calculation is required at pixel level; and (ii) segmentation relies on two interlinked pyramids instead of on a single one. After the link structure is iteratively rearranged, each node of the structure built over frame t is linked to an homogeneous region of cells at frame t, but also to an homogeneous region of cells at frame t−1. When this process is finished, the region at time t is the same one as the region at time t−1. Thus, the scene is segmented not only in space but also in time.
The paper is organized as follows: Section 2 presents the new structure and the link rearranging procedure, Section 3 introduces experiments and results and, finally, conclusions and future work are presented in Section 4.
Section snippets
Spatiotemporal hierarchical segmentation of consecutive pyramids
This section describes the proposed spatiotemporal segmentation procedure. First, a brief introduction on the basic pyramidal structure is presented. The next subsection presents the new stabilization algorithm. The main differences of this algorithm to previous spatial stabilization techniques are highlighted. The last subsection portrays the most relevant features of the proposed algorithm.
Experiments and results
The proposed algorithm provides motion estimation and segmentation as a result of the combined stabilization technique. It must be noted that these results cannot be isolated, because both are extracted for the link structure and used to generate it.
To test the validity of the proposed algorithm, it is going to be used to track a moving person in a corridor using a azimuthal video camera. The camera transmits greyscale images yielding 256×256 pixels to a remote PC, which performs the
Conclusions
We have introduced a new method to achieve spatiotemporal segmentation of a video sequence by using a hierarchical structure. The method consists in generating 4-to-1 linked pyramids over the frames of the sequence and linking each two pyramids built over consecutive frames of the sequence in a combined way. Then, links are rearranged in an iterative bottom-up way to guarantee that each node is associated to a region of pixels yielding an homogeneous grey level in both pyramids.
The main
Acknowledgements
This work has been partially supported by the Spanish Comisión Interministerial de Ciencia y Tecnologı́a (CICYT), project number TIC098-0562.
References (16)
- et al.
A critical view of pyramid segmentation algorithms
Pattern Recognition Lett.
(1990) - et al.
Spatiotemporal MRF approach to video segmentation: application to motion detection and lip segmentation
Signal Process.
(1999) - et al.
A scaled multi-grid optical flow algorithm based on the least RMS error between real and estimated second images
Pattern Recognition
(1999) Figure/ground separation using stochastic pyramid relinking
Pattern Recognition
(1991)- et al.
An analysis of a distributed multiresolution vision system
Pattern Recognition
(1989) - et al.
A hierarchical data structure for picture processing
Computer Graphics and Image Process.
(1975) A computational framework and an algorithm for the measurement of visual motion
Internat. J. Comput. Vision
(1989)- et al.
Systems and experiment performance of optical flow techniques
Internat. J. on Comput. Vision
(1994)
Cited by (4)
A review of log-polar imaging for visual perception in robotics
2010, Robotics and Autonomous SystemsCitation Excerpt :The RWT has been shown to be suitable in road following and depth recovery tasks. Multiresolution, pyramid-based foveal-like mechanisms, such as the Cartesian exponential topology (CET), have been used in active vision [48], video transmission [49], and motion and image segmentation [50,51]. The Cartesian Foveal Geometry (CFG) [52] (Fig. 6(b)) is similar to the CET, but the receptive fields are of constant size regardless of their eccentricity and no actual multiresolution is used.
Using resolution pyramids for watershed image segmentation
2007, Image and Vision ComputingCitation Excerpt :G1 is interpreted as a 3D landscape, where the grey-level g of the pixel in position (x, y), is used as the third coordinate in the landscape. Pyramids are well known multi-resolution representation systems; they provide from coarse to fine representations of a discrete image [7], and have been employed for a number of applications, such as line-drawing analysis or object contour extraction [8–13] and segmentation [14–18]. The general regular pyramid construction strategy is based on the use of a uniform subdivision rule that associates to fixed size regions in the image at a given resolution, single pixels in the image at immediately lower resolution.
Spatiotemporal video segmentation and motion estimation through irregular pyramids
2003, Pattern RecognitionDetecting foreground components in grey level images for shift invariant and topology preserving pyramids
2004, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)