An extension of direct macroblock coding in Predictive (P) slices of the H.264 standard

https://doi.org/10.1016/j.jvcir.2005.04.008Get rights and content

Abstract

The consideration of better motion compensation techniques for inter-frame prediction is one of the key reasons why the new H.264 (MPEG-4 AVC) video coding standard can achieve considerably better coding efficiency compared to older standards such as MPEG-2/4 and H.263. These include the use of multiple references and block sizes, a better interpolation filter for subpixel motion compensation, and more efficient exploitation of the spatio-temporal correlation between motion vectors of adjacent blocks through the consideration of SKIP and DIRECT modes. In this paper, we introduce additional methods into H.264 that further enhance motion compensation and can lead to additional improvements in coding efficiency. This is achieved by further exploiting motion vector temporal correlation through the introduction of a new DIRECT macroblock type and an enhancement to the existing Skip Macroblock type within Predictive (P) slices. These new macroblock types can lead to a considerable reduction in the bits required for encoding motion information, while retaining or even improving quality under a Rate Distortion Optimization Framework. Our simulation results suggest that the proposed improvements can lead up to 7.6% average bitrate reduction or equivalently 0.39 dB quality improvement over the current H.264 standard.

Introduction

The new H.264 (or MPEG-4 AVC, H.26L, JVT) video coding standard [1] claims an improvement of up to 50% in terms of coding efficiency compared to previous standards [2]. This is achieved by the introduction of better coding tools such as more efficient adaptive entropy coders, intra block prediction, an in-loop adaptive deblocking filter, but most importantly through more efficient motion compensation methods used for inter block prediction. Unlike previous standards, H.264 uses an improved interpolation filter for motion prediction down to quarter pixel accuracy, multiple references, and tree structured macroblocks (MBs) based on a quad-tree partitioning scheme according to which different sub-partitions within a MB can be assigned to and predicted using different motion information. The quad-tree structure enables the possibility of a MB being coded in four different modes, with partitions sizes of 16 × 16, 16 × 8, 8 × 16, and 8 × 8, while when in the 8 × 8 partition mode, each 8 × 8 partition can be further split into 8 × 8, 8 × 4, 4 × 8, and 4 × 4 sub-partitions (Fig. 1). Although this feature allowed for more accurate prediction of motion within a picture, it also implied that a significant number of bits had to be assigned to signal the motion information of a MB. As an example, a MB within Predictive (P) slices may have up to 16 different motion vectors (MVs), while this problem becomes even more severe in the case of Bi-Predictive (B) slices where a partition can even be predicted using two sets of MVs pointing to pictures assigned to two separate prediction lists (list0 and list1), which can result up to 32 different MVs.

Although motion vectors are relatively efficiently coded considering that they are differentially coded compared to a MV predictor, taken as the median value of the MVs of the spatially adjacent blocks on the left, top, and top-right (or top-left if the top-right is not available), the H.264 standard introduced some additional methods that allow a further reduction to the amount of bits occupied by motion information, while still resulting in good motion compensated prediction. More specifically, the standard has adopted the concepts of SKIP and DIRECT MB types [1] within P and B slices, respectively, according to which motion information can be directly inferred from previously decoded MBs. Although these types already existed in prior standards, their semantics have been considerably improved to take better advantage of motion correlation. In particular, considering the high reliability of the median MV predictor, SKIP MBs are in most cases assumed to have motion information equal to the median predictor of a 16 × 16 MB (Fig. 2).

Nevertheless, in an attempt to better estimate motion at object/background boundaries, zero motion is used if the left or top adjacent partitions used for the calculation of this predictor have zero motion or the MB belongs to the first column or row of a slice. Furthermore, no residual information is transmitted for a SKIP MB which allows the consideration of additional methods, such as the use of Run Length Coding for coding mode types, for further enhancing the performance of this mode. On the other hand, DIRECT coded MBs can infer motion either spatially and/or temporally. The distinction between spatial and temporal prediction is inferred through signaling at the slice header, and according to which all DIRECT MBs are either in Spatial [3], or Temporal mode. Spatial prediction for DIRECT MBs is relatively similar to that of SKIP MBs, although in this case up to two different MVs can be generated for each list, while a different condition, based on temporal information, is used for handling motion at object boundaries. Finally, for the temporal case motion is inferred through the consideration of motion information in previously decoded pictures and is based on the assumption that an object is moving with relatively constant speed. In this case, the MVs for each list are derived by temporally interpolating the MVs of a co-located MB in the first list1 reference picture (Fig. 3). Considering the relative high reliability of DIRECT prediction, the H.264 standard allowed the use of two alternative DIRECT MB modes, unlike SKIP mode in P slices, one without the consideration of any residual that can also use run length coding, and a second mode which requires the transmission of a residual.

In this paper, we introduce an additional method to further improve coding efficiency by extending the concept of the temporal DIRECT mode to P slices and the introduction of a new mode named as the DirectP mode which, unlike SKIP mode that benefits from exploiting spatial correlation, exploits temporal correlation that exists between adjacent pictures. Furthermore, a simple modification to the existing SKIP MB mode is presented, based on similar assumptions for the spatial DIRECT mode, which can improve coding performance even further. In Section 2, we will first introduce our new DirectP MB mode. The modifications to the semantics of SKIP mode and further extensions of the DirectP MB mode will be given in Section 3, followed by simulation results and our conclusion. It should be noted that our proposed techniques have been previously presented [4] and are currently [5] considered as potential extensions to the H.264 standard.

Section snippets

DIRECT prediction

As was previously discussed, DIRECT prediction can considerably improve the coding efficiency within B slices. The temporal DIRECT mode in particular allows the inference of even up to 32 MVs (16 for each list) without having to transmit any other information. In this mode, the basic assumption is that objects tend to move with relatively constant velocity which allows motion information for the current MB to be predicted using simple interpolation. More specifically, the list0 and list1 MVs

Spatio-temporal skip and DIRECT prediction extensions

The SKIP MB mode is currently the most efficient inter P slice mode within H.264. Nevertheless, it would be desirable if such was improved further. Considering that this mode only considers spatial information for the inference of its motion information, this seems feasible through the consideration of temporal information. Nevertheless, we have found that replacing the SKIP mode MV derivation completely using a method similar to DirectP could potentially impair performance, especially when

Simulation results

The DirectP mode and the semantic modification to the SKIP MB mode were introduced within version 4.3a of the H.264 reference software [8]. For our simulations, we have selected five different sequences, namely QCIF (176 × 144) resolution sequences Container and News coded at 10 fps, and CIF (352 × 288) resolution sequences Mobile, Bus, and Flowergarden at 30 fps. The CAVLC entropy coder was used for all our experiments and the sequences were encoded using quantizer QP values of 28, 32, 36, and 40, a

Conclusion

In this paper, a new Inter MB mode was presented that can exploit the temporal correlation of MVs and can be incorporated into H.264 or other video coding standards and architectures to enhance their performance. An additional semantic change was also introduced to the SKIP MB mode of H.264 that enables the joint consideration of temporal and spatial correlation to derive the mode’s motion information. Simulation results demonstrate that our proposed methods can achieve considerably better

References (9)

  • Advanced video coding for generic audiovisual services,...
  • T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC Video Coding Standard, in: IEEE...
  • A.M. Tourapis, F. Wu, S. Li, Direct mode coding for bi-predictive pictures in the JVT standard, in: Proceedings of the...
  • A.M. Tourapis, F. Wu, S. Li, Direct prediction for Predictive (P) and bi-Predictive (B) frames in video coding, ISO/IEC...
There are more references available in the full text version of this article.

Cited by (0)

1

This work was done while the author was with Microsoft Research Asia.

View full text