Fast variable-size block motion estimation for efficient H.264/AVC encoding

https://doi.org/10.1016/j.image.2004.11.003Get rights and content

Abstract

In this paper, an efficient algorithm is proposed to reduce the computational complexity of variable-size block-matching motion estimation. We first investigate features of multiple candidate search centers, adaptive initial-blocksizes, search patterns, and search step-sizes, to match different motion characteristics and block-sizes. To avoid being trapped in local minima, the proposed algorithm uses multiple candidate motion vectors, which are obtained from different block-sizes. To further reduce the computation cost, a threshold-based early stop strategy according to the quantization parameter is suggested. With adaptive initial block-sizes, a merge-or-skip strategy is also proposed to reduce the computation for the final block-size decision. For the H.264/AVC encoder, simulations show that the proposed algorithms can speed up about 2.6–3.9 times of the original JM v6.1d encoder, which uses fast full-search for all block-sizes, and still maintain a comparable rate-distortion performance.

Introduction

Motion estimation (ME) and motion compensation are critical components in video coding systems. Block matching ME, which searches the most similar block in certain areas of a previously encoded frame, reduces the temporal redundancy of the block and represents it with a displacement motion vector. The fixed block-size motion estimation (FBME) with a 16×16 block-size achieves good coding efficiency and has been adopted widely in international standards, such as MPEG-1 [1], MPEG-2 [2], H.261 [3], and H.263 [4]. After removing the temporal redundancy, the residual block can be compressed by spatial transformation and quantization to remove the spatial redundancy. The motion vectors and the quantized transform coefficients can be further compacted by differential and variable length coding techniques. It is well known that the FBME is the most computation demanding function in the aforementioned video coders.

Many fast and efficient algorithms have been developed in the past decades to overcome the computation bottleneck of the FBME. Usually, these fast algorithms use either simple matching criteria or efficient search patterns which are based on the assumption of a unimodal error surface, to reduce the computation. The matching criteria include the sum of squared difference (SSD) (or mean square error (MSE)), sum of absolute difference (SAD) (or mean absolute difference (MAD)), etc. Efficient search patterns comprise three-step search (TSS) [6], novel three-step search (NTSS) [7], four-step search [8], diamond search (DS) [9], and hexagon-based search (HBS) [10]. The DS method is particularly effective for slow motion video sequences. The HBS algorithm is attractive since its complexity and quality performance are close to those of DS, and its speed-up is especially evident for large motion video. The ME algorithms using efficient search patterns usually suffer from the problem of being trapped at local minima. In [11], the diversity-based search strategy (DSS) is suggested by combining the strengths of DS and TSS search patterns, to cover different kinds of motion to avoid from being trapped at local minima. Some ME algorithms focus on selecting a better search-center from motion vectors of adjacent blocks and co-located blocks in the previous frame [12], [13], [14]. The search-center could be obtained from several pre-checked search-points that result in acceptable matching errors in the search area. Starting from a better search-center not only reduces the search-points but also improves the performance since it helps to reduce the probability of being trapped at local minima. The concept of initialization for a better search-center or prediction of motion vector can be combined with fast algorithms described above.

Recently, some researchers demonstrated that variable block-size motion estimation (VBME) gives better coding efficiency than FBME [16], [17], [18]. Hence, H.263+ [4] and MPEG-4 [5] further suggest the 8×8 ME mode to improve the coding efficiency. The advanced video coding standard, H.264/AVC [15] extends the VBME to seven different block-sizes and multiple reference frames, to further improve the overall performance. The VBME with multiple reference frames requires much heavier computation then the FBME.

In H.264/AVC, there are 16×16, 16×8, 8×16, and 8×8 block modes, where the 8×8 block mode can be further classified into 8×8, 8×4, 4×8 and 4×4 sub-block modes, as shown in Fig. 1. To search for the best block-partition with these seven possible block-sizes, applying fast FBME algorithms to each block-size is an intuitive solution to reduce the VBME computation. Thus, some modified FBME strategies [19], [20], [21] have been suggested for the H.264/AVC standard.

In this paper, we first extend the DSS [11] to VBME by initiating multiple search-centers, and adaptively exploit search strategies with different search step-sizes for different block-sizes. To reduce the computation, an inter-mode decision is applied to select an initial block-size for ME. Merge and early termination strategies are further suggested to save computation substantially. The rest of the paper is organized as follows. In Section 2, we first briefly analyze the behaviors of motion vector predictions from spatial candidates and the possible motion vector inferences from difference block-sizes, for the H.264 encoder. Then, we extend the diversity-based ME algorithm to include different search initializations and search step-sizes adapting to various block-sizes to achieve efficient ME. In Section 3, we propose early termination of motion search, adaptation of initial block-sizes, merge-or-skip, and compose-and-refine strategies to further reduce the VBME computation. Finally, experimental results and conclusions are given in 4 Experimental results, 5 Conclusions, respectively.

Section snippets

Variable block-size motion estimation

To achieve the best ME for a block-size, N1×N2, we should find the motion vector (vx(N1,N2),vy(N1,N2)) by minimizing the Lagrangian cost as JN1×N2(vx,vy)=SADN1×N2(vx,vy)+λR(MVDN1×N2(vx,vy)),where the SAD between the current and the reference block is given by SADN1×N2(vx,vy)=i=1N1j=1N2|f^m(i+vx,j+vy)-f(i,j)|.

In (2), f(i,j) and f^m(i,j) denote the pixels in the current and the mth coded frames, respectively. In (1), the motion vector difference (MVD) is expressed as MVDN1×N2(vx,vy)=|vx(N1,N2)-v

Computation reduction algorithms

For computation reduction, in this section, we further propose several effective algorithms. We first develop the zero-block condition to early stop the unnecessary searches. Moreover, we also suggest an adaptive selection algorithm to determine an initial block-size for ME according to the motion vectors or prediction errors obtained from the initial block-type. With the merge-or-skip strategy, we can avoid the search for all block-sizes. The detailed descriptions of computation reduction

Experimental results

To evaluate our proposed algorithm, we integrate it into the reference software JM v6.1d [24] of H.264/AVC. Some important parameters are set as follows: (1) sequence type is IPPP…; (2) search range is 33×33; (3) number of reference frames is 1; (4) Hadamard transform is used; (5) entropy coding method is CAVLC; (6) no rate-distortion optimization (RDO); and (7) no random intra macroblock refresh in P pictures. Video sequences such as Table Tennis, Foreman, Stefan, Mother and Daughter, and

Conclusions

In this paper, we first propose an extended diversity-based (EDSS) ME algorithm. It uses adaptive step-size and search-pattern for different block-sizes. By using motion vectors obtained from the current macroblock with different block-sizes, and from the adjacent blocks, the EDSS algorithm is very close to FS in terms of the rate-distortion performance. We then develop a threshold-based early stop strategy to reduce the unnecessary computations. Further, a merge-or-skip strategy, which can be

References (31)

  • ISO/IEC, Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About...
  • ISO/IEC, Information Technology-Generic Coding of Moving Pictures and Associated Audio Information: Video,...
  • Video codec for audiovisual services at p×64kbits, ITU-T Recommendation H.261, March...
  • Video coding for low bitrate communication, ITU-T Recommendation H.263 Version 1, 1995. Version 2, September...
  • ISO/IEC, Information Technology-Generic Coding of Audio-Visual Objects Part 2: Visual, ISO/IEC 14496-2 (MPEG-4 Video),...
  • T. Koga, K. Iinuma, A. Iijima, T. Ishiguro, Motion-compensated interframe coding for video conferencing, in:...
  • R. Li et al.

    A new three-step search algorithm for block motion estimation

    IEEE Trans. Circuits Systems Video Technol.

    (1994)
  • L.M. Po et al.

    A novel four-step search algorithm for fast block motion estimation

    IEEE Trans. Circuits Systems Video Technol.

    (1996)
  • S. Zhu et al.

    A new diamond search algorithm for fast block matching motion estimation

    IEEE Trans. Image Process.

    (2000)
  • C. Zhu et al.

    Hexagon-based search pattern for fast block motion estimation

    IEEE Trans. Circuits Systems Video Technol.

    (2002)
  • J. Xin et al.

    Diversity-based fast block motion estimation

    Proceedings of IEEE International Conference on Multimedia and Expo 2003 (ICME’03)

    (2003)
  • K.-L. Chung et al.

    A new predictive search area approach for fast block motion estimation

    IEEE Trans. Image Process.

    (2003)
  • J. Chalidabhongse et al.

    Fast motion vector estimation using multiresolution-spatio-temporal correlations

    IEEE Trans. Circuits Systems Video Technol.

    (1997)
  • Y.-L. Chan et al.

    An efficient search strategy for block motion estimation using image features

    IEEE Trans. Image Process.

    (2001)
  • T. Wiegand, G. J. Sullivan, A. Luthra, Draft ITU-T Recommendation H.264 and Final Draft International Standard 14496-10...
  • Cited by (13)

    View all citing articles on Scopus

    This research was partially supported by National Science Council under Contract #NSC-92-2213- E006-023 and the Opto-Electronics and Systems Laboratories, Industrial Technology Research Institute under Contract # 93S18-S3, Taiwan.

    View full text