Semi-automatic video object segmentation using seeded region merging and bidirectional projection

doi:10.1016/j.patrec.2004.09.017

Pattern Recognition Letters

Volume 26, Issue 5, April 2005, Pages 653-662

https://doi.org/10.1016/j.patrec.2004.09.017 Get rights and content

Abstract

In this paper, we propose a novel approach to semi-automatic video object segmentation. First, an interactive video object segmentation tool is presented for the user to easily define the desired video objects in the first frame, which is user-friendly, flexible and efficient due to the proposed fast seeded region merging approach and the combination of two different ways of user interaction, i.e., marker drawing and region selection. Then, a bidirectional projection approach is proposed to automatically track the video objects in the subsequent frames, which combines forward projection and backward projection to improve the segmentation efficiency, and incorporates pixel classification with region classification in backward projection to guarantee a more reliable tracking performance. Experimental results for various types of the MPEG-4 test sequences demonstrate an efficient and faithful segmentation performance of the proposed approach.

Introduction

As an important issue for the implementation of many content-based multimedia applications supported by MPEG-4, video object segmentation remains a challenging research topic until now. Although human beings can easily identify different video objects in a video sequence, it is hard for a computer to automatically segment the desired video objects in any kind of generic video sequences. At present, efficient algorithms for automatic video object segmentation only apply to moving objects or some kind of objects with a prior knowledge (Fan and Elmagarmid, 2002; Fan et al., 2001; Kim and Hwang, 2002; Kim et al., 1999; Meier and Ngan, 1998; Tsaig and Averbuch, 2002). In the near future, it seems hardly possible to develop a generic automatic algorithm applicable to a variety of video sequences. Therefore, a more practical solution, the so-called semi-automatic video object segmentation (Cooray et al., 2001; Gatica-Perez et al., 1999; Gu and Lee, 1998a, Gu and Lee, 1998b; Guo et al., 1999; Kim et al., 2001, Kim et al., 2003; Lim et al., 2000; Luo and Eleftheriadis, 2002, Sun et al., 2003), draws more and more attention in recent years. A typical paradigm of semi-automatic video object segmentation consists of two steps, i.e., segmenting the first frame with user interaction to define the video objects, and automatically tracking in the subsequent frames.

The first step is extremely important in any semi-automatic video object segmentation algorithms, because the accuracy of the segmented video objects directly determines the success or failure of the following tracking process. A user-friendly segmentation tool should be provided for the user to conveniently define the video objects, and user interaction activity should be minimized to improve the segmentation efficiency. However, the flexibility and efficiency of user interaction are rarely considered as important as the algorithm itself in most existing approaches. The most common way of user interaction is to delineate an approximate contour clinging to the video object (Guo et al., 1999; Kim et al., 2001, Kim et al., 2003). However, it is a burdened job to move mouse along the true object contour, especially when the shape of the object is complex. For those approaches associated with snake model, a considerable number of control points around the object contour need to be selected one by one (Luo and Eleftheriadis, 2002; Sun et al., 2003). Region selection is a more natural way to define a video object, but an excessive number of regions still need to be selected at different partition levels (Cooray et al., 2001). In this paper, we propose an interactive video object segmentation tool, which is user-friendly, flexible and efficient due to the proposed fast seeded region merging approach and the combination of two different ways of user interaction.

The second step is a process of video object tracking. Many approaches adopt a two-step configuration to track the video objects (Gu and Lee, 1998a; Guo et al., 1999; Lim et al., 2000, Kim et al., 2001, Kim et al., 2003), i.e., first project the previous objects to the current frame using some kind of parametric motion model, and then refine the object boundaries. The underlying tracking mechanism is forward projection, which works well for rigid objects with translation motion. For non-rigid objects with multiple motions, irregular boundaries and uncertain holes may appear on the video objects, and inevitable post-processing is needed for boundary refinement. In contrast with forward projection, backward projection (Gatica-Perez et al., 1999; Gu and Lee, 1998b) is suitable to deal with non-rigid objects, and needs no further refinements. Each segmented region in the current frame is projected to the previous frame, and then it is assigned to the current video object if the majority of the projected region overlaps the previous video object. In nature, it is a region classification approach rather than a tracking approach. However, it is not an efficient way to backward project all segmented regions for classification. Another problem may occur when a segmented region overlaps the video object and the background, which causes peninsulas or gaps to appear on the video object no matter what classification it is assigned to. So far, we have discussed the main features and limitations of forward projection and backward projection. In this paper, we propose a bidirectional projection approach mainly as an extension of backward projection (Gu and Lee, 1998b), which is more efficient due to the combination with forward projection, and ensures the visual quality of the tracked video objects by incorporating pixel classification with region classification.

This paper is organized as follows. In Section 2, an interactive video object segmentation tool is presented. Section 3 proposes our bidirectional projection approach. Experimental results for different types of the MPEG-4 test sequences are shown in Section 4. Conclusions are given in Section 5.

Section snippets

Interactive video object segmentation

In order to facilitate the user to easily extract the desired video object, we combine two ways of user interaction, i.e., marker drawing and region selection, and propose a flexible scheme shown in Fig. 1. The whole procedure of interactive video object segmentation consists of three steps: marker drawing, automatic video object extraction, and user correction. A screen shot of our graphical user interface (GUI) is shown in Fig. 2, which is exploited to clearly describe each step in the

Automatic video object tracking

In this section, we propose a bidirectional projection approach to automatically track the extracted video objects in the subsequent frames of the video sequence. Our tracking approach can be defined as obtaining the video object vo_n of the current frame, based on the motion information related with the previous video object vo_n−1, and the spatial segmentation information of the current frame. The flowchart of the proposed tracking approach is depicted in Fig. 3, which consists of three steps:

Experimental results

We use several MPEG-4 test sequences to test the proposed approach to semi-automatic video object segmentation. The experimental results for three test sequences are shown in Fig. 5, Fig. 6, Fig. 7. These sequences represent different levels of spatial detail and movement in real situations. The first sequence Mother and Daughter is a MPEG-4 class A sequence, with low spatial detail and low amount of movement. The background is uniform and static, and the motion of human bodies is relatively

Conclusions

Video object segmentation is an inevitable necessity for MPEG-4 related multimedia applications. A novel approach to semi-automatic video object segmentation is proposed in this paper, which incorporates interactive segmentation and automatic tracking. An interactive video object segmentation tool is presented to allow the user to easily define the video objects. The user interaction is more convenient due to the flexible combination of marker drawing and region selection, and is also minimized

References (21)

S. Di Zenzo
A note on the gradient of a multi-image
Comput. Vis. Graphics Image Process.
(1986)
J. Fan et al.
An automatic algorithm for semantic object generation and temporal tracking
Signal Process. Image Commun.
(2002)
M. Kim et al.
Moving object segmentation in video sequence by user interaction and automatic object tracking
Image Vis. Comput.
(2001)
H.T. Luo et al.
An interactive authoring system for video object segmentation and annotation
Signal Process.: Image Commun.
(2002)
R. Adams et al.
Seeded region growing
IEEE Trans. Pattern Anal. Machine Intell.
(1994)
S. Cooray et al.
Hierarchical semi-automatic video object segmentation for multimedia applications
Proc. SPIE Internet Multimedia Manage. Syst. II
(2001)
J. Fan et al.
Automatic model-based semantic object extraction algorithm
IEEE Trans. Circ. Syst. Video Technol.
(2001)
D. Gatica-Perez et al.
Semantic video object extraction based on backward tracking of multivalued watershed
Proc. IEEE ICIP
(1999)
C. Gu et al.
Semiautomatic segmentation and tracking of semantic video objects
IEEE Trans. Circ. Syst. Video Technol.
(1998)
C. Gu et al.
Semantic video object tracking using region-based classification
Proc. IEEE ICIP
(1998)

There are more references available in the full text version of this article.

Cited by (7)

Real-time spatiotemporal segmentation of video objects in the H.264 compressed domain
2007, Journal of Visual Communication and Image Representation
This paper presents a real-time spatiotemporal segmentation approach to extract video objects in the H.264 compressed domain. The only exploited segmentation cue is the motion vector (MV) field extracted from the H.264 compressed video. MV field is first temporally and spatially normalized and then accumulated by an iteratively backward projection scheme to enhance the salient motion. Then global motion compensation is performed on the accumulated MV field, which is also moderately segmented into different motion-homogenous regions by a modified statistical region growing algorithm. The hypothesis testing using the block residuals of global motion compensation is employed for intra-frame classification of segmented regions, and the projection is exploited for inter-frame tracking of previous video objects. Using the above results of intra-frame classification and inter-frame tracking as input, a correspondence matrix based spatiotemporal segmentation approach is proposed to segment video objects under different situations including appearing and disappearing objects, splitting and merging objects, stopping moving objects, multiple object tracking and scene change in a unified and efficient way. Experimental results for several H.264 compressed video sequences demonstrate the real-time performance and good segmentation quality of the proposed approach.
Projection measure-driven optimization of q-rung orthopair fuzzy MAGDM for computer network security evaluation
2024, International Journal of Knowledge-Based and Intelligent Engineering Systems
Robust object detection and tracking using a space-temporal mutual feedback scheme
2008, 2008 IEEE International Symposium on Knowledge Acquisition and Modeling Workshop Proceedings, KAM 2008
Efficient video object segmentation based on Gaussian mixture model and Markov random field
2008, International Conference on Signal Processing Proceedings, ICSP
A novel video object tracking approach based on kernel density estimation and Markov random field
2007, Proceedings - International Conference on Image Processing, ICIP
An algorithm on extraction of saline-alkalized land by image segmentation based on ETM<sup>+</sup> image
2006, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

View full text

Semi-automatic video object segmentation using seeded region merging and bidirectional projection

Abstract

Introduction

Section snippets

Interactive video object segmentation

Automatic video object tracking

Experimental results

Conclusions

Comput. Vis. Graphics Image Process.

Signal Process. Image Commun.

Image Vis. Comput.

Signal Process.: Image Commun.

Seeded region growing

IEEE Trans. Pattern Anal. Machine Intell.

Hierarchical semi-automatic video object segmentation for multimedia applications

Proc. SPIE Internet Multimedia Manage. Syst. II

Automatic model-based semantic object extraction algorithm

IEEE Trans. Circ. Syst. Video Technol.

Semantic video object extraction based on backward tracking of multivalued watershed

Proc. IEEE ICIP

Semiautomatic segmentation and tracking of semantic video objects

IEEE Trans. Circ. Syst. Video Technol.

Semantic video object tracking using region-based classification

Proc. IEEE ICIP