2-D mesh-based video object segmentation and tracking with occlusion resolution☆
Introduction
Object-based video representation requires segmentation of the scene into video objects through robust boundary tracking and detection of uncovered regions as either parts of existing objects or new objects. MPEG-4 has recently emerged as a popular object-based video standard, but no normative segmentation method was specified in the standard [5], [13]. It is clear that the utility of the object-based tools in the standard depends on the quality of the segmentation and discrimination of video objects, should these tools be employed for compression and manipulation of traditional video.
Video object segmentation methods can be classified as either two-frame motion/object segmentation or multi-frame spatio-temporal segmentation/tracking methods. Among the former are region-based parametric motion segmentation methods [1], [2] and clustering pixel eigenfeature vectors using fuzzy c-means method [3]. Among the latter, are blob-tracking algorithms such as P-finder [15], contour-tracking methods such as condensation algorithm [7] and occlusion adaptive motion snake [6], and methods based on finding best matches of object models in contour map of successive frames [8]. Several 2-D mesh-based object tracking methods [4], [11], [12], [14] have also been proposed; however, they assumed that the initial object boundary was marked interactively. Region or mesh-based methods have in general been shown to be more robust than pixel-based segmentation methods [1], [11]. In this paper, we propose a unified 2-D mesh-based approach for fully automatic video object segmentation and tracking, which fuses node-based motion and triangle-based color information instead of using a pixel-based approach.
At the first frame, a number of feature points are selected as nodes of a coarse 2-D content-based mesh. These points are classified as foreground and background nodes based on node motion analysis over the next N frames, yielding a coarse estimate of the foreground object boundary. Color differences across triangles near the coarse boundary are exploited for a maximum contrast path search, subject to search control constraints, along the edges of the 2-D mesh to refine the boundary of the video object. Next, we propagate this refined boundary to the subsequent frame by using motion vectors of the node points to form the coarse boundary at the next frame, which will then be refined by the maximum contrast path search. Because motion estimation cannot be perfect for all nodes and there may be occlusion regions, the 2-D mesh topology needs to be updated [4] at certain locations. The boundary of the newly uncovered regions are then refined by using the 2-D mesh topology and search mechanism. These regions are re-meshed and either appended to the foreground object or tracked as new objects. The segmentation procedure is re-initialized when the detected occlusion regions exceed a given percentage of the video object area.
The organization of the paper is as follows. The proposed 2-D mesh-based segmentation method, formulated as a constrained maximum contrast path search problem, is discussed in Section 2. Section 3 presents a 2-D mesh tracking method with occlusion detection and mesh update. Experimental results are given in Section 4. Conclusions and future directions are discussed in Section 5.
Section snippets
2-D mesh-based video object segmentation with multi-frame motion filtering
This section presents a coarse to fine hierarchical 2-D mesh-based video object segmentation algorithm. First, a coarse boundary of the video object is estimated based on feature (node) point selection and a multi-frame node motion analysis as discussed in Section 2.1. Next, refinement of this coarse boundary is formulated as a constrained maximum contrast path search based on node point motion vectors and colors within the triangles as explained in Section 2.2.
2-D mesh tracking with occlusion and new object detection
This section deals with the tracking of the refined boundary from the previous frame to the current frame in the presence of self-occlusion (out-of-plane rotation or articulated motion) and object-to-object occlusion. The tracking algorithm includes three main steps: uncovered-region detection, classification of occlusion type, and boundary refinement. We consider only occlusions that relate to the object boundary; that is, we do not attempt to detect self-occlusions that are completely within
Experimental results
We demonstrate the proposed segmentation and tracking method on three sequences: frames 20–100 of “Mother and Daughter”, frames 40–75 of “Hall”, and frames 8–19 of “Hamburg Taxi” with frame increments of 2, 2 and 1, respectively. The 20th, 40th and 8th frames of these sequences are shown in Fig. 3, Fig. 4, Fig. 5, respectively. We consider “Mother and Daughter” as a single large foreground video object (VO). The man in the “Hall” VO, on the contrary, is a relatively small VO with articulated
Conclusions
A 2-D mesh-based hierarchical segmentation and tracking method with occlusion detection has been proposed. The results show that the method can discriminate successfully between multiple moving objects and track them in the presence of self-occlusion and object-to-object occlusion. An important consideration for the mesh-based tracking approach is that the mesh topology may need to be updated in the presence of articulated motion (e.g., legs crossing each other or hand movements) as well as
References (15)
- A. Alatan, L. Onural, M. Wollborn, R. Mech, E. Tuncel, T. Sikora, Image sequence analysis for emerging interactive...
- Y. Altunbaşak, P.E. Eren, A.M. Tekalp, Region-based parametric motion segmentation using color information, Graphical...
- R. Castagno, T. Ebrahimi, M. Kunt, Video segmentation based on multiple features for interactive multimedia...
- I. Celasun, A.M. Tekalp, Optimal 2D hierarchical content-based mesh design and update for object-based video, IEEE...
- L. Chiariglione, MPEG and multimedia communications, IEEE Trans. Circuits Systems Video Technol. 7 (1) (February 1997)...
- Y. Fu, A.T. Erdem, A.M. Tekalp, Tracking visible boundary of objects using occlusion-adaptive motion snake, IEEE Trans....
- M. Isard, A. Blake, Condensation – Conditional density propagation for visual tracking, Internat. J. Comput. Vision 29...
Cited by (25)
Hierarchical spatio-temporal extraction of models for moving rigid parts
2011, Pattern Recognition LettersCitation Excerpt :This work is an important technological aspect for the success of emerging object-based MPEG-4 and MPEG-7 multimedia applications. Multi-frame spatio-temporal segmentation/tracking: Celasun et al. (2001) present VOS based on 2D meshes. Tekalp et al. (1998) describe 2D mesh-based modeling of video objects as a compact representation of motion and shape for interactive video manipulation, compression, and indexing.
A multi-Kalman filtering approach for video tracking of human-delineated objects in cluttered environments (DOI:10.1016/j.cviu.2006.02.003)
2006, Computer Vision and Image UnderstandingMultiregion competition: A level set extension of region competition to multiple region image partitioning
2006, Computer Vision and Image UnderstandingA multi-Kalman filtering approach for video tracking of human-delineated objects in cluttered environments
2005, Computer Vision and Image UnderstandingCitation Excerpt :The goal of boundary updating is to make a best attempt at extracting the silhouette of the object in the current frame. As was already mentioned in the Introduction, the problem of silhouette extraction of a moving object undergoing rotation is highly ill-posed [3,33,12,28,24,40,22]. All proposed solutions are based on the assumption that the object pixels in the vicinity of the boundary in the current frame possess texture and color properties similar to the object pixels in the vicinity of the boundary in the previous frame.
Video analysis of human dynamics - A survey
2003, Real-Time ImagingARTOD: Autonomous real time objects detection by a moving camera using recursive density estimation
2016, Studies in Computational Intelligence
- ☆
This work was supported by TÜBİTAK under contract EEEAG-198E011.