Semi-automatic video object segmentation using seeded region merging and bidirectional projection

https://doi.org/10.1016/j.patrec.2004.09.017Get rights and content

Abstract

In this paper, we propose a novel approach to semi-automatic video object segmentation. First, an interactive video object segmentation tool is presented for the user to easily define the desired video objects in the first frame, which is user-friendly, flexible and efficient due to the proposed fast seeded region merging approach and the combination of two different ways of user interaction, i.e., marker drawing and region selection. Then, a bidirectional projection approach is proposed to automatically track the video objects in the subsequent frames, which combines forward projection and backward projection to improve the segmentation efficiency, and incorporates pixel classification with region classification in backward projection to guarantee a more reliable tracking performance. Experimental results for various types of the MPEG-4 test sequences demonstrate an efficient and faithful segmentation performance of the proposed approach.

Introduction

As an important issue for the implementation of many content-based multimedia applications supported by MPEG-4, video object segmentation remains a challenging research topic until now. Although human beings can easily identify different video objects in a video sequence, it is hard for a computer to automatically segment the desired video objects in any kind of generic video sequences. At present, efficient algorithms for automatic video object segmentation only apply to moving objects or some kind of objects with a prior knowledge (Fan and Elmagarmid, 2002; Fan et al., 2001; Kim and Hwang, 2002; Kim et al., 1999; Meier and Ngan, 1998; Tsaig and Averbuch, 2002). In the near future, it seems hardly possible to develop a generic automatic algorithm applicable to a variety of video sequences. Therefore, a more practical solution, the so-called semi-automatic video object segmentation (Cooray et al., 2001; Gatica-Perez et al., 1999; Gu and Lee, 1998a, Gu and Lee, 1998b; Guo et al., 1999; Kim et al., 2001, Kim et al., 2003; Lim et al., 2000; Luo and Eleftheriadis, 2002, Sun et al., 2003), draws more and more attention in recent years. A typical paradigm of semi-automatic video object segmentation consists of two steps, i.e., segmenting the first frame with user interaction to define the video objects, and automatically tracking in the subsequent frames.

The first step is extremely important in any semi-automatic video object segmentation algorithms, because the accuracy of the segmented video objects directly determines the success or failure of the following tracking process. A user-friendly segmentation tool should be provided for the user to conveniently define the video objects, and user interaction activity should be minimized to improve the segmentation efficiency. However, the flexibility and efficiency of user interaction are rarely considered as important as the algorithm itself in most existing approaches. The most common way of user interaction is to delineate an approximate contour clinging to the video object (Guo et al., 1999; Kim et al., 2001, Kim et al., 2003). However, it is a burdened job to move mouse along the true object contour, especially when the shape of the object is complex. For those approaches associated with snake model, a considerable number of control points around the object contour need to be selected one by one (Luo and Eleftheriadis, 2002; Sun et al., 2003). Region selection is a more natural way to define a video object, but an excessive number of regions still need to be selected at different partition levels (Cooray et al., 2001). In this paper, we propose an interactive video object segmentation tool, which is user-friendly, flexible and efficient due to the proposed fast seeded region merging approach and the combination of two different ways of user interaction.

The second step is a process of video object tracking. Many approaches adopt a two-step configuration to track the video objects (Gu and Lee, 1998a; Guo et al., 1999; Lim et al., 2000, Kim et al., 2001, Kim et al., 2003), i.e., first project the previous objects to the current frame using some kind of parametric motion model, and then refine the object boundaries. The underlying tracking mechanism is forward projection, which works well for rigid objects with translation motion. For non-rigid objects with multiple motions, irregular boundaries and uncertain holes may appear on the video objects, and inevitable post-processing is needed for boundary refinement. In contrast with forward projection, backward projection (Gatica-Perez et al., 1999; Gu and Lee, 1998b) is suitable to deal with non-rigid objects, and needs no further refinements. Each segmented region in the current frame is projected to the previous frame, and then it is assigned to the current video object if the majority of the projected region overlaps the previous video object. In nature, it is a region classification approach rather than a tracking approach. However, it is not an efficient way to backward project all segmented regions for classification. Another problem may occur when a segmented region overlaps the video object and the background, which causes peninsulas or gaps to appear on the video object no matter what classification it is assigned to. So far, we have discussed the main features and limitations of forward projection and backward projection. In this paper, we propose a bidirectional projection approach mainly as an extension of backward projection (Gu and Lee, 1998b), which is more efficient due to the combination with forward projection, and ensures the visual quality of the tracked video objects by incorporating pixel classification with region classification.

This paper is organized as follows. In Section 2, an interactive video object segmentation tool is presented. Section 3 proposes our bidirectional projection approach. Experimental results for different types of the MPEG-4 test sequences are shown in Section 4. Conclusions are given in Section 5.

Section snippets

Interactive video object segmentation

In order to facilitate the user to easily extract the desired video object, we combine two ways of user interaction, i.e., marker drawing and region selection, and propose a flexible scheme shown in Fig. 1. The whole procedure of interactive video object segmentation consists of three steps: marker drawing, automatic video object extraction, and user correction. A screen shot of our graphical user interface (GUI) is shown in Fig. 2, which is exploited to clearly describe each step in the

Automatic video object tracking

In this section, we propose a bidirectional projection approach to automatically track the extracted video objects in the subsequent frames of the video sequence. Our tracking approach can be defined as obtaining the video object von of the current frame, based on the motion information related with the previous video object von−1, and the spatial segmentation information of the current frame. The flowchart of the proposed tracking approach is depicted in Fig. 3, which consists of three steps:

Experimental results

We use several MPEG-4 test sequences to test the proposed approach to semi-automatic video object segmentation. The experimental results for three test sequences are shown in Fig. 5, Fig. 6, Fig. 7. These sequences represent different levels of spatial detail and movement in real situations. The first sequence Mother and Daughter is a MPEG-4 class A sequence, with low spatial detail and low amount of movement. The background is uniform and static, and the motion of human bodies is relatively

Conclusions

Video object segmentation is an inevitable necessity for MPEG-4 related multimedia applications. A novel approach to semi-automatic video object segmentation is proposed in this paper, which incorporates interactive segmentation and automatic tracking. An interactive video object segmentation tool is presented to allow the user to easily define the video objects. The user interaction is more convenient due to the flexible combination of marker drawing and region selection, and is also minimized

References (21)

There are more references available in the full text version of this article.

Cited by (7)

View all citing articles on Scopus
View full text