Elsevier

Real-Time Imaging

Volume 11, Issue 4, August 2005, Pages 290-299
Real-Time Imaging

Robust moving object segmentation on H.264/AVC compressed video using the block-based MRF model

https://doi.org/10.1016/j.rti.2005.04.008Get rights and content

Abstract

Moving object segmentation in compressed domain plays an important role in many real-time applications, e.g. video indexing, video transcoding, video surveillance, etc. Because H.264/AVC is the up-to-date video-coding standard, few literatures have been reported in the area of video analysis on H.264/AVC compressed video. Compared with the former MPEG standard, H.264/AVC employs several new coding tools and provides a different video format. As a consequence, moving object segmentation on H.264/AVC compressed video is a new task and challenging work. In this paper, a robust approach to extract moving objects on H.264/AVC compressed video is proposed. Our algorithm employs a block-based Markov Random Field (MRF) model to segment moving objects from the sparse motion vector field obtained directly from the bitstream. In the proposed method, object tracking is integrated in the uniform MRF model and exploits the object temporal consistency simultaneously. Experiments show that our approach provides the remarkable performance and can extract moving objects efficiently and robustly. The prominent applications of the proposed algorithm are object-based transcoding, fast moving object detection, video analysis on compressed video, etc.

Introduction

Moving object segmentation aims at partitioning an image sequence into meaningful regions along the time axis. In general, moving object segmentation techniques in pixel domain have to fully decode the compressed video first. Such algorithms are quite accurate but cannot fulfill the requirement of real-time applications. As a consequence, fast algorithms to segment moving objects performed directly on compressed video are desired. Such type of video processing in compressed domain plays an important role in many real-time applications, e.g. video indexing, video coding, video surveillance, video manipulation, etc. [1], [2], [3], [4].

Moving object segmentation algorithms in compressed domain usually rely on two types of features in terms of macroblock (MB): motion vector (MV) and DCT coefficients. MVs are obtained in the motion compensation between the current frame and its reference frames block by block. MV presents the temporal correlation between two frames and provides the displacement of the block. All MVs in one frame can be treated as a sparse motion vector fields. On the other hand, the DCT coefficients of an MB carry the image information. For the inter-coded block, DCT coefficients contain the residues of the motion compensation. For the intra-coded block, DCT coefficients are transformed signal of the original image. Therefore, the block DCT coefficients can be used to reconstruct the DC image [5] or treated as the texture feature to measure the similarity of blocks [6].

However, as the up-to-date video coding standard, H.264/AVC employs several new coding tools and provides a different video format [7]. As a consequence, moving object segmentation on H.264/AVC compressed video is a challenging work. Very little work has been carried out in the area of video analysis on H.264/AVC compressed video. H.264/AVC has some new characters from the video analysis point of view. In H.264/AVC, the intra-coded block is spatial intra-predicted according to its neighbor pixels. So, the DCT coefficients provide the spatial prediction residues information for blocks now. On the other hand, H.264/AVC supports variable block-size motion compensation. An MB may be partitioned into several blocks and has several MVs. As a result, the MV field for H.264/AVC compressed video consists of MVs with variant block size. This is quite different from the former MPEG standard video with regular block size MVs. Therefore, there is a requirement of efficient object extraction technique for the H.264/AVC compressed video. In this paper, we propose a novel object segmentation algorithm using the block-based Markov Random Field (MRF) model to extract moving objects based on the MVs and block DC coefficients with decoding. The proposed approach treats the object segmentation as a Markovian block labeling process and integrates object tracking in the uniform MRF model simultaneously. The method can extract moving objects efficiently and robustly.

This paper is organized as follows. First, Section 2 briefly reviews some related work of moving object segmentation over MPEG compressed domain. Then, an overview of the proposed algorithm is presented in Section 3. Our algorithm consists of two stages: the MV classification and the MRF classification. They are detailed in 4 Motion vector classification, 5 MRF classification, respectively. The process of I-frame segmentation is discussed in Section 6. Experimental results are presented in Section 7, and finally concluding remarks are provided in Section 8.

Section snippets

Related work

Recently, some moving object segmentation algorithms over MPEG compressed domain have been reported. In MPEG compressed video, pictures are encoded in terms of I-frame, P-frame and B-frame. P-frames and B-frames store the motion information and residues of the motion compensation, I-frame has no motion information and stores the DCT transformed signals of the original image. Thus, I-frame can provide texture or color information without decoding. Most of the object segmentation algorithms in

Overview of the proposed algorithm

Three types of temporally interleaved frames are supported in H.264/AVC bitstream. The first is I-frame that is intra-coded on 16×16 or 4×4 pixel blocks. The second is P-frame that is motion compensated in the forward direction from I-frame or other P-frame. The third is B-frame that is motion compensated in both directions. A group-of-pictures (GOP) refers to the frames between I-frames. Because consecutive P-frames can provide continues motion information through the whole video, only

Motion vector classification

Because only the MV corresponding to the true motion will provide reliable motion information, the MV related to the real object should be identified first. Since MVs are issued from a coding-oriented criterion only, the MV field is quantized and noisy. This leads to a fundamental drawback that constrains the video-processing algorithm on compressed video from achieving satisfactory results. In order to filter out noise and recover the true MV, we design a classification process to determine

MRF classification

Moving object segmentation can be treated as a Markovian labeling procedure on the classified MV field. MRF model provides spatial continuity that is inherent in nature images, and is used to guide the block merging process under the maximum a posteriori (MAP) criterion. In the proposed model, three clues are taken into account and integrated in the uniform MRF model. The first clue is the MV similarity that is used to merge blocks into the motion homogeneous region. The second clue is the

Object label projection for I-frame

In H.264/AVC compressed video, only P-frame and B-frame have the MVs. I-frame is intra-coded picture and has no MVs. Two block coding types are defined in H.264/AVC: Intra_4×4 and Intra_16×16. Both the types of blocks are coded by the spatial prediction from the adjacent pixels. The Intra_4×4 type is corresponding to the block with 4×4 size, while the Intra_16×16 type to the 16×16 block. Thus, the number of blocks in an I-frame vary and depend on the frame content.

To obtain the segmentation

Experimental results

The proposed algorithm is evaluated by two outdoor surveillance sequences. The first is PIE sequence with CIF (352×288) format. It is captured with a long focal length in a high speedway. The second is ETRI_od_A sequence with SIF (352×240) format. It is a middle focal length sequence and selected from the 30th CD in the MPEG-7 content set. Fig. 3, Fig. 4 show two original frames of the sequences. In our experiments, both the test sequences are compressed using the H.264 encoder of version JM

Conclusions and future work

This paper presents a robust approach to segment moving objects on H.264/AVC compressed video. Using the block-based MRF model, the proposed algorithm efficiently segments moving objects from the noisy MV field. Our method can be used in object-based transcoding, fast moving object detection, video analysis on compressed video, etc. Although the proposed approach is designed for H.264/AVC, it can be easily extended to other video format, such as MPEG-1, MPEG-2, H.261, etc. The algorithm is

Acknowledgements

The work is supported by the NEC-JDL joint project and One Hundred Talents Plan of Chinese Academy of Science. The authors thank Dr. Siwei Ma for providing the H.264/AVC CODEC. The authors appreciate the help from the anonymous reviewers for their valuable and constructive comments.

References (19)

  • M.K. Mandal et al.

    A critical evaluation of image and video indexing techniques in the compressed domain

    Image and Vision Computing

    (1999)
  • H.J. Zhang et al.

    Video parsing and browsing using compressed data

    Multimedia Tools Application

    (1995)
  • S.F. Chang

    Exploring functionalities in the image/video compressed domain

    ACM Computing Surveys

    (1995)
  • S.W. Lee et al.

    Fast scene change detection using direct feature extraction from MPEG compressed video

    IEEE Transactions on Multimedia

    (2000)
  • B.L. Yeo et al.

    Rapid scene analysis on compressed video

    IEEE Transaction on Circuits Systems for Video Technology

    (1995)
  • S. Ji et al.

    Moving object segmentation in DCT-based compressed video

    Electronics Letters

    (2000)
  • T. Wiegand et al.

    Overview of the H.264/AVC video coding standard

    IEEE Transaction on Circuits Systems for Video Technology

    (2003)
  • H. Zen et al.

    Moving object detection from MPEG coded picture

    Proceedings of the IEEE International Conference on Image Processing

    (1999)
  • R.V. Babu et al.

    Video object segmentation: a compressed domain approach

    IEEE Transaction on Circuits Systems for Video Technology

    (2004)
There are more references available in the full text version of this article.

Cited by (0)

View full text